登录查看更多内容

Heap allocated, or not heap allocated, that is the question

Armin Kassemi Langroodi

C++ Software Engineer at ASML

发布日期: 2023年9月30日

Introduction

C++ exceptions can make the programmers' life easier on error handling. By determining the error occurrence scope via the try-block and setting an optional exception filter at the catch-statement, the possible error can be handled in the catch-block. Almost nothing more than that! The only thing that the programmers can complain about is the larger executable file size.

The larger executable file size is because nothing in C++ is easy [1]. Even though C++ exceptions are easy to use, the compiler underlying should add extra machine-level codes to facilitate this easy-to-use error handling mechanism. But, these added machine-level codes are not necessarily costly if no exception occurs [2].

In the safety-critical system context, the whole story can be changed. A safety-certified compiler may not allow the existence of C++ exceptions in the code due to the lack of static exception allocation on its supported ABI. This can happen because the C++ standard does not specify the way of exception memory allocation (at least yet) [3]. Then the safety-certified compiler may see C as the hero here. What can be wrong with the C-style error handling?!

Basically, from the memory allocation perspective there is nothing wrong with that. There is just a variable called errno that requires no dynamic memory allocation and no extra machine-level codes from the compiler side. However, this is the gain, which also has a pain. The pain is at the programmers' side which they need to put if-statements at different parts of their code to check the value of the errno and decide based on the value to continue or not.

This can lead to a code base that is hard to understand and difficult to maintain. The aim of this article is to check the possibility of using C++ exceptions in safety-critical systems by comparing the exception allocation strategies in different C++ compilers.

Exception allocation in C++ compilers

Following four compilers has been chosen for comparison based on their popularity which none of them is actually a safety-certified compiler:

GNU Compiler Collection (GCC)

Following C++ function tends to compute the square root of a non-negative number, and if the argument is negative, it throws an exception:

double get_square_root(double number)
{
    if (number < 0)
    {
        throw std::invalid_argument("Negative number");
    }

    double result = pow(number, 0.5);

    return result;
}

The assembly code of the function above compiled by x86-64 gcc 13.2 without any compiler option looks like this:

.LC1:
        .string "Negative number"
get_square_root(double):
        push    rbp
        mov     rbp, rsp
        push    r12
        push    rbx
        sub     rsp, 32
        movsd   QWORD PTR [rbp-40], xmm0
        pxor    xmm0, xmm0
        comisd  xmm0, QWORD PTR [rbp-40]
        jbe     .L8
        mov     edi, 16
        call    __cxa_allocate_exception
        mov     rbx, rax
        mov     esi, OFFSET FLAT:.LC1
        mov     rdi, rbx
        call    std::invalid_argument::invalid_argument(char const*) [complete object constructor]
        mov     edx, OFFSET FLAT:_ZNSt16invalid_argumentD1Ev
        mov     esi, OFFSET FLAT:_ZTISt16invalid_argument
        mov     rdi, rbx
        call    __cxa_throw
.L8:
        movsd   xmm0, QWORD PTR .LC2[rip]
        mov     rax, QWORD PTR [rbp-40]
        movapd  xmm1, xmm0
        movq    xmm0, rax
        call    pow
        movq    rax, xmm0
        mov     QWORD PTR [rbp-24], rax
        movsd   xmm0, QWORD PTR [rbp-24]
        movq    rax, xmm0
        jmp     .L9
        mov     r12, rax
        mov     rdi, rbx
        call    __cxa_free_exception
        mov     rax, r12
        mov     rdi, rax
        call    _Unwind_Resume
.L9:
        movq    xmm0, rax
        add     rsp, 32
        pop     rbx
        pop     r12
        pop     rbp
        ret

Now let's focus only on the exception throwing logic which leads to the __cxa_allocate_exception function call with the argument value 16 according to the AMD64 calling conventions [4]. What is this function and what is that argument?

This is Itanium C++ ABI that GCC implements as the standard C++ ABI to be able to generate machine codes for major operating systems and major hardware architecture [5]. The implementation is as follows [6]:

extern "C" void *
__cxxabiv1::__cxa_allocate_exception(std::size_t thrown_size) noexcept
{
  thrown_size += sizeof (__cxa_refcounted_exception);

  void *ret = malloc (thrown_size);

#if USE_POOL
  if (!ret)
    ret = emergency_pool.allocate (thrown_size);
#endif

  if (!ret)
    std::terminate ();

  memset (ret, 0, sizeof (__cxa_refcounted_exception));

  return (void *)((char *)ret + sizeof (__cxa_refcounted_exception));
}

Now, it is clear that the mentioned value 16 as the argument is the thrown_size. In this case GCC:

Computes the exception header size via __cxa_refcounted_exception;
Enlarges the thrown_size by the header size;
Hires the malloc to allocate dynamically for the thrown_size;
Resets the allocated exception header;
Returns the heap pointer to the allocated memory with the header size offset.

If it cannot allocate the memory, it simply calls the std::terminate considering the GLIBCXX_EH_POOL_STATIC compiler flag is disabled. That is why throwing an exception can be unsafe, because if the exception cannot be allocated, the whole program can crash.

However, if the flag is enabled, the scenario is different. In that case, GCC at the program initialization creates an emergency memory pool which is a safe approach. Memory allocation failures at the initialization are early enough to prevent catastrophic events. So, if malloc fails, GCC allocates the exception in the memory pool and if in the worst case the pool is frustrated, GCC has no choice to call the std::terminate again.

LLVM Clang

The assembly code of the same square root function mentioned above compiled by x86-64 clang 17.0.1 without any compiler option looks like this:

.LCPI0_0:
        .quad   0x3fe0000000000000              # double 0.5
get_square_root(double):                   # @get_square_root(double)
        push    rbp
        mov     rbp, rsp
        sub     rsp, 48
        movsd   qword ptr [rbp - 8], xmm0
        xorps   xmm0, xmm0
        ucomisd xmm0, qword ptr [rbp - 8]
        jbe     .LBB0_4
        mov     edi, 16
        call    __cxa_allocate_exception@PLT
        mov     rdi, rax
        mov     rax, rdi
        mov     qword ptr [rbp - 40], rax       # 8-byte Spill
        lea     rsi, [rip + .L.str]
        call    std::invalid_argument::invalid_argument(char const*)@PLT
        jmp     .LBB0_2
.LBB0_2:
        mov     rdi, qword ptr [rbp - 40]       # 8-byte Reload
        mov     rsi, qword ptr [rip + typeinfo for std::invalid_argument@GOTPCREL]
        mov     rdx, qword ptr [rip + std::invalid_argument::~invalid_argument()@GOTPCREL]
        call    __cxa_throw@PLT
        mov     rdi, qword ptr [rbp - 40]       # 8-byte Reload
        mov     rcx, rax
        mov     eax, edx
        mov     qword ptr [rbp - 16], rcx
        mov     dword ptr [rbp - 20], eax
        call    __cxa_free_exception@PLT
        jmp     .LBB0_5
.LBB0_4:
        movsd   xmm0, qword ptr [rbp - 8]       # xmm0 = mem[0],zero
        movsd   xmm1, qword ptr [rip + .LCPI0_0] # xmm1 = mem[0],zero
        call    pow@PLT
        movsd   qword ptr [rbp - 32], xmm0
        movsd   xmm0, qword ptr [rbp - 32]      # xmm0 = mem[0],zero
        add     rsp, 48
        pop     rbp
        ret
.LBB0_5:
        mov     rdi, qword ptr [rbp - 16]
        call    _Unwind_Resume@PLT

By focusing again on the exception throwing logic, the same function call to __cxa_allocate_exception@PLT can be seen (PLT only refers to the fetch of the calling address at the Procedure Linkage Table).

This indicates that Clang also same as GCC implements the Itanium C++ ABI, but the implementation is slightly different as demonstrated below [7]:

领英推荐

C++20: More Details about Module Support of the Big…

Rainer Grimm 1 年前

Code Blocks

Ferdinand Charles 1 年前

Optimization with Allocators in C++17

Rainer Grimm 1 年前

void *__cxa_allocate_exception(size_t thrown_size) throw() {
    size_t actual_size = cxa_exception_size_from_exception_thrown_size(thrown_size);

    // Allocate extra space before the __cxa_exception header to ensure the
    // start of the thrown object is sufficiently aligned.
    size_t header_offset = get_cxa_exception_offset();
    char *raw_buffer =
        (char *)__aligned_malloc_with_fallback(header_offset + actual_size);
    if (NULL == raw_buffer)
        std::terminate();
    __cxa_exception *exception_header =
        static_cast<__cxa_exception *>((void *)(raw_buffer + header_offset));
    ::memset(exception_header, 0, actual_size);
    return thrown_object_from_cxa_exception(exception_header);
}

Clang in contrast with GCC:

Does not support an emergency memory pool;
Uses memory alignment for both the header and the exception itself.

Intel C++ Compiler (ICC)

The assembly code of the same square root function mentioned above compiled by x86-64 icc 2021.7.1 without any compiler option looks like this:

get_square_root(double):
        push      rbp                                           #7.1
        mov       rbp, rsp                                      #7.1
        sub       rsp, 32                                       #7.1
        movsd     QWORD PTR [-32+rbp], xmm0                     #7.1
        movsd     xmm0, QWORD PTR [-32+rbp]                     #8.9
        pxor      xmm1, xmm1                                    #8.5
        comisd    xmm0, xmm1                                    #8.18
        jae       ..B1.5        # Prob 50%                      #8.18
        jp        ..B1.5        # Prob 0%                       #8.18
        mov       eax, 16                                       #10.9
        mov       rdi, rax                                      #10.9
        call      __cxa_allocate_exception                      #10.9
        mov       QWORD PTR [-24+rbp], rax                      #10.9
        mov       rax, QWORD PTR [-24+rbp]                      #10.37
        mov       edx, offset flat: .L_2__STRING.0              #10.37
        mov       rdi, rax                                      #10.37
        mov       rsi, rdx                                      #10.37
        call      std::invalid_argument::invalid_argument(char const*) [complete object constructor]                 #10.37
        mov       rax, QWORD PTR [-24+rbp]                      #10.37
        mov       edx, offset flat: typeinfo for std::invalid_argument    #10.37
        mov       ecx, offset flat: std::invalid_argument::~invalid_argument() [complete object destructor] #10.37
        mov       rdi, rax                                      #10.37
        mov       rsi, rdx                                      #10.37
        mov       rdx, rcx                                      #10.37
        call      __cxa_throw                                   #10.37
..B1.5:                         # Preds ..B1.1
        movsd     xmm0, QWORD PTR [-32+rbp]                     #13.21
        movss     xmm1, DWORD PTR .L_2il0floatpacket.1[rip]     #13.21
        cvtss2sd  xmm1, xmm1                                    #13.21
        call      pow                                           #13.21
        movsd     QWORD PTR [-16+rbp], xmm0                     #13.21
        movsd     xmm0, QWORD PTR [-16+rbp]                     #13.21
        movsd     QWORD PTR [-8+rbp], xmm0                      #13.19
        movsd     xmm0, QWORD PTR [-8+rbp]                      #15.12
        leave                                                   #15.12
        ret

ICC also calls __cxa_allocate_exception to throw the exception and due to the compatibility with GCC, it can be assumed that the exception also here ends up somewhere in the heap [8].

Microsoft Visual C++ (MSVC)

Up until now, all the mentioned compilers by default allocate exceptions on the heap. However, the assembly code of the same square root function compiled by MSVC is totally different. This is because C++ exceptions work differently on Windows [9]. MSVC allocates exceptions on the stack! But does this mean that error handling on Windows is safe?

Exception in Adaptive AUTOSAR

The short answer to the previous question is: No! The long answer is: No, it is not! Even if the exception data structure is allocated on the stack, it is possible that the exception itself allocates a class member on the heap.

For example, the std::invalid_argument exception can return an error message on its what member function call. If the error message is long enough that it cannot benefit from the Small String Optimization, the message needs to be allocated on the heap, even the exception itself lies on the stack [10].

This problem can be avoided by following the proposal of the ara::core::Exception structure in the Adaptive AUTOSAR standard. The constructor of the proposed exception based on the R22-11 standard is as follows [11]:

explicit ara::core::Exception::Exception (ErrorCode err) noexcept;

in which the ara::core::ErrorCode can be constructed as shown below:

constexpr ara::core::ErrorCode::ErrorCode (ErrorDomain::CodeType
value, const ErrorDomain &domain, ErrorDomain::SupportDataType data=
ErrorDomain::SupportDataType()) noexcept;

and the ara::core::ErrorDomain itself like follows:

explicit constexpr ara::core::ErrorDomain::ErrorDomain (IdType id)
noexcept;

while CodeType and IdType are type aliasing for std::int32_t and std::uint64_t respectively, and SupportDataType is implementation defined. Thus, the whole ara::core::ErrorCode can be potentially constructed as a plain data object which can be allocated statically and accordingly the ara::core::Exception requires no dynamic allocation.

Conclusion

The idea of exception stack allocation of MSVC sounds to be a safe strategy on using C++ exceptions on safety-critical systems, but it has still two issues:

There is a possibility of stack overflow on large amount and/or large size exception occurrence in the underlying stack frames;
Portability to different operating systems and hardware architecture can be difficult, because some stack unwinding procedures may require OS function calls [12].

Then the only left option is hiring a memory pool similar to the GCC implementation. As discussed before, there is still a risk of runtime failure in the memory pool frustration. But, the combination of the large-enough memory pool with the ara::core::Exception significantly reduces the failure rate that can make this combination a good approach of error handling in safety-critical systems.

要查看或添加评论，请登录

Armin Kassemi Langroodi的更多文章

In Compiler We Trust

2022年3月20日

In Compiler We Trust

Introduction Agner Fog in his C++ optimization guide, Chapter 11 [1] instead of using following function: float…
Are Linux Kernel 5.x and Hyper-threading close friends?

2021年8月9日

Are Linux Kernel 5.x and Hyper-threading close friends?

Introduction Intel introduced Hyper-threading Technology in 2002 [1] in order to use CPU resources more efficiently…
Is Fast Inverse Square Root still Fast?

2021年4月27日

Is Fast Inverse Square Root still Fast?

Introduction Fast Inverse Square Root (Fast InvSqrt) is an algorithm that quickly estimates the inverse of the square…

4 条评论
Why does "nm" matter in CPUs?

2021年4月5日

Why does "nm" matter in CPUs?

According to Moore's law, we can roughly expect the number of CPU transistor to be doubled every two years [1]. That…

2 条评论

Heap allocated, or not heap allocated, that is the question

Armin Kassemi Langroodi

C++ Software Engineer at ASML

Introduction

Exception allocation in C++ compilers

GNU Compiler Collection (GCC)

LLVM Clang

领英推荐

Intel C++ Compiler (ICC)

Microsoft Visual C++ (MSVC)

Exception in Adaptive AUTOSAR

Conclusion

Armin Kassemi Langroodi的更多文章

社区洞察

其他会员也浏览了

Optimizing LLVM Back-End: Global Instruction Selection

Understanding Qualifiers in C++: A Complete Guide

C++20: The Advantages of Modules

Exploring and Innovating with the GNU Compiler on AArch64: Part 2 (Nov 3, 2024)

Constant Folding in C++

SSA (Static Single Assignment) Property of Intermediate Representation.

C++ Insights: Implicit Conversions

C++: Concepts, the Details

Demystifying List Initialization in C++ Vectors: When Braces Do More Than You Think

Getting Started With LLVM?

Introduction

Exception allocation in C++ compilers

GNU Compiler Collection (GCC)

LLVM Clang

领英推荐

Intel C++ Compiler (ICC)

Microsoft Visual C++ (MSVC)

Exception in Adaptive AUTOSAR

Conclusion

Armin Kassemi Langroodi的更多文章

In Compiler We Trust

Are Linux Kernel 5.x and Hyper-threading close friends?

Is Fast Inverse Square Root still Fast?

Why does "nm" matter in CPUs?

社区洞察

其他会员也浏览了

Optimizing LLVM Back-End: Global Instruction Selection

Understanding Qualifiers in C++: A Complete Guide

C++20: The Advantages of Modules

Exploring and Innovating with the GNU Compiler on AArch64: Part 2 (Nov 3, 2024)

Constant Folding in C++

SSA (Static Single Assignment) Property of Intermediate Representation.

C++ Insights: Implicit Conversions

C++: Concepts, the Details

Demystifying List Initialization in C++ Vectors: When Braces Do More Than You Think

Getting Started With LLVM?