C++: detailed analysis of the language performance (Part 2)

C++: detailed analysis of the language performance (Part 2)

This is the second in a series of articles about C++ performance. Did you read part 1 already ? If not, please do so:

In the second article we will confront performance issues caused by passing objects as parameters.

A simple workhorse class

Before we get into the actual examples we need to create a class we can use for our measurements and evaluations. For this purpose we create a "DynamicString" class which implements some sort of Python-like variable length string which can be assigned and manipulated.

We will not implement it to the point of being actually usable (it will not even be barely close to it) but it will have all the nasty bits which can make C++ perform much slower than it could if you don't use it properly. Also, we are not including all the safe programming practices, such as testing null pointers before using them, to keep code shorter and more readable.

Let's jump straight to the code.

class DynamicString
{
public:
    DynamicString()
    {
        str = strdup("");
    }

    DynamicString(const char *str)
    {
        this->str = strdup(str);
    }


    operator const char *() const
    {
        return str;
    }


protected:
    char *str;
    char useless[16];
};


Our class consists of:

  • A protected member "str" containing a pointer to the dynamic string's contents
  • A protected member "useless" to eat up some space and avoid the optimizer does too much of a good job (with tiny classes like this one, the optimizer can pass an entire instance in a register rather than on the stack, making the difference between passing by value and by reference less clear
  • A constructor with no parameters, which will allocate and assign a new empty string
  • A constructor with a "const char *" parameter which will make a copy of the input string and assign it to the "str" member
  • A cast operator to "const char *" to be able to access contents within the class in read only mode

This class has several flaws, the easiest to spot of which is the lack of a destructor which will clean up dynamic memory allocated in the constructors, but we will keep it like this to demonstrate what's going on one step at a time.

Here is a simple main function using our DynamicString class in the most simplistic way: declare it and print it (mostly because in its current state, you can do nothing with it).

int main()
{
    DynamicString str1("hello");
    DynamicString str2;


    printf("str1=<%s>, str2=<%s>\n", (const char *)str1, (const char *)str2);

    return 0;
}

Unsurprisingly, this little program produces the following output:

No alt text provided for this image

Which shows we can successfully write a program doing nothing.

The obvious: passing by value and by reference

Let's now take our DynamicString class and try to pass instances to functions, and check the performance impact if we pass the object by value or by reference. We will write two simple functions in another cpp file to avoid the optimizer makes them inline:

volatile char c;

void ByValue(DynamicString s)
{
    c = ((const char *)s)[0];
}

void ByRef(const DynamicString &s)
{
    c = ((const char *)s)[0];
}

A couple of quick notes:

  • c is declared volatile to avoid smart optimizer tricks (unlikely here because we are in a separate compilation unit, but anyway...)
  • we are using the DynamicString's cast operator to "const char *" to access its contents

Now let's build a main function to call the two functions and measure performance:

    gettimeofday(&before, NULL);
    for(int i=0; i<100000000; i++)
    {
        ByValue(str1);
    }
    gettimeofday(&after, NULL);
    printf("Time passing by value: %0.6f\n", DeltaTime(before, after));

    gettimeofday(&before, NULL);
    for(int i=0; i<100000000; i++)
    {
        ByRef(str1);
    }
    gettimeofday(&after, NULL);
    printf("Time passing by reference: %0.6f\n", DeltaTime(before, after));

This program produces the following output:

No alt text provided for this image

So passing by value, as can be expected, is slower than passing by reference. But why ? Let's have a look at how the two functions have been implemented:

_Z7ByValue13DynamicString:
    movq    8(%rsp), %rax
    movzbl    (%rax), %edx
    movq    c@GOTPCREL(%rip), %rax
    movb    %dl, (%rax)
    ret

_Z5ByRefRK13DynamicString:
    movq    (%rdi), %rax
    movzbl    (%rax), %edx
    movq    c@GOTPCREL(%rip), %rax
    movb    %dl, (%rax)
    ret

With a minor difference in the first assembly line, which fetches the address of the input parameter, the two functions are identical. They both fetch, in their assembly implementation, a pointer to a DynamicString.

The real difference is how the two functions get called in the main function.

Here is the code generated to pass str1 by value, effectively making a full copy of str1's contents onto the stack:

    subq    $8, %rsp
    pushq    56(%rsp)
    pushq    56(%rsp)
    pushq    56(%rsp)
    call    _Z7ByValue13DynamicString@PLT

And here is what happens when passing str1 by reference, just loading str1's address into a register:

    movq    %rbp, %rdi
    call    _Z5ByRefRK13DynamicString@PLT

The larger the object's size, the larger the difference becomes. Let's expand "useless":

    char useless[1024];

No alt text provided for this image

And the code generated by the caller becomes a loop which makes very clear what's happening under the hood:

subq    $1040, %rsp                     ; Make space in the stack 
                                        ; for temp variable
movl    $129, %ecx                      ; Load copy counter
movq    %rbp, %rsi                      ; source address: str1 location
movq    %rsp, %rdi                      ; destination address: 
                                        ; temp variable in the stack
rep     movsq                           ; copy
call    _Z7ByValue13DynamicString@PLT   ; call

"rep movsq" is an x64 instruction which looks like a single operation, but in fact it copies memory from %rsi (source address register) to %rdi (destination address register) for a %ecx (count register) times.

So far we haven't seen anything really new, any C programmer knows that passing a struct by value is slower than passing it by pointer.

But we will soon find out that C++ has plenty of surprises for us.

Add a destructor and die

As we have discussed before, DynamicString suffers from several issues, among which is the lack of a destructor. As a result, allocating and deallocating a DynamicString will cause a memory leak because nobody is freeing the memory allocated by the constructor.

So here's our destructor:

    ~DynamicString()
    {
        free(str);
    }

Yes yes... there's no check on str != nullptr to keep code simple, especially when viewing assembly.

Running the same program as before gives this very disappointing result:

No alt text provided for this image

The problem here is that the compiler will:

  1. create a copy of str1 into the stack as a temporary variable
  2. call ByValue
  3. call the destructor for the temporary variable
  4. repeat

The temporary variable contains a "dumb" copy of str1, hence it shares the pointer to "str" with str1:

No alt text provided for this image

Step 3 in the sequence above will therefore destroy the pointer which is shared with str1, and the next iteration will try to use / free a pointer which has already been destroyed.

We then get an error from the runtime library (If we are lucky... in many cases we just get a crash).

Let's see how calling code has been implemented:

    ; Load loop counter in r12
    ; (as "i" is used just for counting
    ; it's more efficient to start from
    ; top and count down to zero as
    ; comparing with zero is faster
    ; than comparing with constant,
    ; the optimizer has sorted that out for us)
    movl    $100000000, %r12d
.L4:
    ; Destination register for copy:
    ; address of temp variable in stack
    movq    %rbx, %rdi
    ; data size counter
    movl    $129, %ecx
    ; source register for copy:
    ; address of str1    
    movq    %rbp, %rsi
    ; copy
    rep         movsq
    ; pass address of temp variable to ByValue
    movq    %r13, %rdi
    movq    %r13, %rbx
    ; call ByValue
    call    _Z7ByValue13DynamicString@PLT
    ; inlined destructor: free "str"
    ; inside the temp variable which is however
    ; shared with str1
    movq    1072(%rsp), %rdi
    call    free@PLT
    ; decrement r12 (which is "i")
    subl    $1, %r12d
    ; keep looping until done
    jne    .L4

The "rep movsq" instruction, which we have already seen before, performs a memory copy of str1's memory image into the temp variable's memory image before passing the temp variable to ByValue. This is called the default copy constructor: when initializing an instance of a class starting from another instance of the same class (like in our case, initializing the temp variable out of str1) the compiler's default behaviour is to make a copy of its memory image.

The default copy constructor is a bad idea when the class uses dynamically allocated memory or contains resources which must be claimed and then freed.

Defining the copy constructor

The solution to this problem is to define our own copy constructor. A copy constructor is just a constructor which accepts a single parameter, a const reference to an object of the same type:

    DynamicString(const DynamicString &that)
    {
        this->str = strdup(that.str);
        memcpy(this->useless, that.useless, sizeof(useless));
    } 

In our new copy constructor we are duplicating the string of the instance we are constructing ourselves from, rather than copying the pointer.

Notice that the new copy constructor completely replaces the default copy constructor, so we must ensure that we copy all we need (including the "useless" array in case it's useful for something) because nobody will do that for us anymore.

Performance measurements are now as follows:

No alt text provided for this image

Which are even worse than before as we have the additional overhead of strdup / free at each iteration.

Let's have a quick look at generated code in the caller (omitting the init / loop parts, just focusing on each individual call)

; -------- beginning of inlined copy constructor
; this->str = strdup(that.str);
movq    32(%rsp), %rdi
call    strdup@PLT
; memcpy(this->useless, that.useless, sizeof(useless));
movl    $128, %ecx
movq    %r12, %rdi
movq    %rbp, %rsi
rep movsq
; -------- end of inlined copy constructor
movq    %r13, %rdi
movq    %rax, 1072(%rsp)
call    _Z7ByValue13DynamicString@PLT
; -------- beginning of inlined destructor
; free(str);
movq    1072(%rsp), %rdi
call    free@PLT
; -------- end of inlined destructor

We can now see that everything is as expected, for each loop we get:

  1. create a temporary variable in the stack initializing it with the copy constructor which duplicates the string
  2. call ByValue
  3. call the destructor for the temporary variable which frees the string

Conclusions (and next articles)

As we have seen, passing objects to functions (and we will later see, even worse, returning objects from functions) is bad for performance, and it should be avoided as much as possible, passing objects as const references instead.

Passing objects by value can, in some cases, make a C++ program perform (almost) as slowly as the same program written in C# or Java, where everything is treated as a reference implicitly, thus providing foundation for the myth that C++ is slow.

There are rare cases, however, where it may be needed: for instance, when passed parameters need to be modified locally without affecting the parameter passed by the caller.

In the next articles we will go into more details about passing and returning objects by value, looking at assignment operators, move operators, and move constructors as techniques to mitigate the impact of temporary object construction and destruction.


Sergio Rofi Pallone

Responsible of Factory Automation BU presso Soft-in S.r.l.

5 年

Quando la passione si traduce in risultati!

回复
Giustiniano La Vecchia

Inspiring people. Founder Hubrains no profit Association. Tedx speaker,Keynote Speaker. Iscritto all’albo degli Innovation manager

5 年

Ecco cosa vuol dire maneggiare le competenze e condividerle con autorevolezza

回复
Luca Mini

Technical Director presso MECT S.R.L.

5 年

Sempre un grande sei!

要查看或添加评论,请登录

Guido Piasenza的更多文章

  • Realtime audio processing with Linux: part 3

    Realtime audio processing with Linux: part 3

    Introduction In the previous articles we have introduced basic concepts about audio processing and got ourselves…

    3 条评论
  • Realtime audio processing with Linux: part 2

    Realtime audio processing with Linux: part 2

    Introduction In the previous article we have introduced basic concepts about audio processing, and begun to map them to…

    4 条评论
  • Realtime Audio processing with Linux

    Realtime Audio processing with Linux

    Introduction Do you drive a recent car ? Chances are your infotainment system, i.e.

    4 条评论
  • 3D visualisation using Qt3D: part 3

    3D visualisation using Qt3D: part 3

    Introduction In this third installement of my Qt3D tutorials we will dig a bit deeper into materials and the impact…

    1 条评论
  • 3D visualisation using Qt3D: part 2

    3D visualisation using Qt3D: part 2

    Introduction In the second part of the tutorial we will dig deeper into the structure used by Qt3D to represent scenes,…

  • 3D visualisation using Qt3D: part 1

    3D visualisation using Qt3D: part 1

    Introduction In my company we are developing a machine vision + AI based assistant which can provide visual hints about…

    6 条评论
  • C++: detailed analysis of the language performance (Part 3)

    C++: detailed analysis of the language performance (Part 3)

    This is the third of a series of articles about C++ perfomance. Did you read part 2 already ? If not, please do so: In…

    2 条评论
  • C++: detailed analysis of the language performance (Part 1)

    C++: detailed analysis of the language performance (Part 1)

    As an avid C++ supporter I frequently had to face several objections to the language, mostly based on its (supposed)…

    16 条评论

社区洞察

其他会员也浏览了