c++ efficiency subject

Hi there!

This is a subject related to copy constructors and efficiency in C++. The following code snippet shows three functions: 'example1', 'example2' and 'example3'. I have been discussing with a workmate which one is the most efficient :

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68

#include <iostream>
#include <string>

//just an example class
class MyClass
{
public:
    std::string toString()
    {
        return "just an example";
    }
};

static MyClass myObj;


bool example1()
{
    std::string str = myObj.toString();//executing copy constructor of std::string
    
    //... some code
    std::cout << str << std::endl;
    
    //.. some other code
    std::cout << str << std::endl;
    
    return true;
}

bool example2()
{
    const std::string& str = myObj.toString();//avoid copy constructor of std::string, but more verbose.
    
    //... some code
    std::cout << str << std::endl;
    
    //.. some other code
    std::cout << str << std::endl;
    
    return true;
}

bool example3()
{
    //avoid copy constructor of std::string, but we get two call stacks .
    
    //... some code
    std::cout << myObj.toString() << std::endl;
    
    //.. some other code
    std::cout << myObj.toString() << std::endl;
    
    return true;
}


int main()
{
    for (size_t i = 0; i < 1000000; i++)
    {
        //example1();
        example2();
        //example3();
    }
}



In my opinion, the less efficient one is 'example1' because it implies the execution of the copy constructor which, in turn, implies a system call to allocate memory and another call to free such memory. But my workmate says 'example1' is the most efficient one, because the compiler will perform an optimization to avoid the copy constructor.

What is your opinion about that?

My opinion is that efficiency issues should not be left to the compiler, they should always be fixed in the c++ code. The code is aimed to be run on all platforms (Windows, Linux and Mac) and different compilers may behave in a different way. Even different versions of the same compiler may or may not perform optimizations.


I would choose 'example2' or even 'example3' because a call stack will always be faster than a system call to allocate memory.

Here I used std::string as an example, but it can be with any user defined complex type.

Thanks a lot!
GCC and Clang produces the exact same result for both example1() and example2() with optimizations enabled.

In C++17 you're guaranteed that the returned string in example1() will be elided so that no copy or move takes place. In C++11 and C++14 the object could get moved, but any compiler worth its salt would elide the copy in this situation.

example3() is the least efficient because two strings need to be constructed, one for each call to toString().
Last edited on
My opinion is that efficiency issues should not be left to the compiler, they should always be fixed in the c++ code.

If you did something wrong that makes a program slower, fix it for sure. That includes messing up memory allocations and unnecessary copies.

Assuming the code is correct, though, trying to optimize code should be left alone until you find a code block that underperforms during your stress testing. I wrote real time code for a long, long time and the worst thing you can do to yourself is optimize where it isn't needed.. the code tends to get weird and full of 'do it this way because it ran faster' commentary and the programmer falls behind refactoring working code that was fine instead of writing new code. When you need speed, there is a time and place to mess with it, just choose your battles carefully. Take this example... odds are you already wasted time on it, was this *really* slowing down your real time critical section of code in some high performance application?




Programmer efficiency (writing clean, readable, maintainable code) is always the responsibility of the programmer.

High level run-time efficiency (design, choice of data structures, containers, algorithms etc.) is also the responsibility of the programmer.

Low level run-time efficiency (eg. strength reduction, loop unrolling, copy-elision) is usually best left to the language and the compiler. Compiler writers typically know an order of magnitude more about what is more efficient at the machine level than run of the mill programmers.
Hi all guys and thanks for your answers!

What I wanted to say is that we can not be sure which optimizations a compiler will apply. There are no 'standard way for optimizations'.

In my example I used std::string and it is very likely that it has implemented a move constructor. But what if instead of std::string we are using another complex type, let's say 'MyType', which does not implement the move constructor and we are using a compiler with no optimizations?


I would like to add that if the compiler has performed an optimization in the returned value of 'toString', I bet it has also inlined the call to 'toString' so still I think 'example3' is more efficient than 'example1'
Make your example a bit more exiting:
Assume that it is part of multi-threaded program and that some other thread might modify the myObj between the two couts. What is logically correct approach then?
What I wanted to say is that we can not be sure which optimizations a compiler will apply. There are no 'standard way for optimizations'.


Then I guess you have a responsibility to know your tools.
Then I guess you have a responsibility to know your tools.

... responsible ... until you publish your open source code and anyone can pull it to their platform where they compile it with their toolset.


Doesn't everyone have responsibility to know their tools?
So which approach is preferred in general?

Personally, if I see a function that returns by value I have normally used approach #1 because I know the return value will probably get elided, but otherwise it will get moved which is usually pretty fast too.

But what if the function return type is changed to a const reference? Code written with approach #1 would still work correctly but might be slower than code written with approach #2 because a copy needs to be made. On the other hand, code written with approach #2 might run into trouble if the object being referenced is destroyed or modified unexpectedly while still being used.
jessCPP wrote:
But what if [...] we are using a compiler with no optimizations?

Then any discussion of efficiency is meaningless. C++, the language and the library, are designed for an optimizing compiler.

As for the initial example, I would say the best return type would be std::string_view (or its older equivalent). In the Linux C++ ABI, it is returned in two CPU registers:
1
2
3
4
5
6
toString(): # @toString()
  movl $15, %eax
  movl $.L.str, %edx
  retq
.L.str:
  .asciz "just an example"

> But what if the function return type is changed to a const reference?

Copy elision can still be performed as long as the observable behaviour of the program is not affected.

1. There may be no observable behaviour resulting from copying an object:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
#include <cstdio>

struct A
{
    A() = default ;
    A( const A& that ) : data(that.data) { maybe_observable_behaviour() ; }
    // ...

    int data {} ;
    char filler[12] {} ;
    
    // no observable behaviour
    void maybe_observable_behaviour() {}
};

const A& foo() { static A a ;  a.data = 7 ; return a ; }

int bar()
{
    A a = foo() ;
    return a.data + 6 ;
    // 1. set foo::a.data = 7
    // 2. return 13 ;
}

int baz()
{
    A a = foo() ;
    A* a2 = new A(a) ;
    const int result =  a.data + 6 ;
    delete a2 ;
    return result ;
    // 1. set foo::a.data = 7
    // 2. return 13 ;
}

int foobar() 
{ 
    return bar() + baz() ; 
    // 1. set foo::a.data = 7
    // 2. return 26 ;
}

https://godbolt.org/g/hshW1b


2. Copy-elision can still be performed, as long as the observable behaviour when the objects are actually copied or moved is reproduced:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
#include <cstdio>

volatile int g = 0 ;

struct A
{
    A() = default ;
    A( const A& that ) : data(that.data) { maybe_observable_behaviour() ; }
    // ...

    int data {} ;
    char filler[12] {} ;
    
    // known observable behaviour
    void maybe_observable_behaviour() { ++g ; }
};

const A& foo() { static A a ;  a.data = 7 ; return a ; }

int bar()
{
    A a = foo() ;
    return a.data + 6 ;
    // 1. set foo::a.data = 7
    // 2. increment ::g
    // 3. return 13 ;
}

int baz()
{
    A a = foo() ;
    A* a2 = new A(a) ;
    const int result =  a.data + 6 ;
    delete a2 ;
    return result ;
    // 1. set foo::a.data = 7
    // 2. increment ::g
    // 3. increment ::g
    // 4. return 13 ;
}

int foobar() 
{ 
    return bar() + baz() ; 
    // 1. set foo::a.data = 7
    // 2. increment ::g
    // 3. increment ::g
    // 4. increment ::g
    // 5. return 26 ;
}

https://godbolt.org/g/EnLT9r

(In the two snippets, the calls to allocation/de-allocation functions (new/delete) are also elided: this is permitted by the standard, even if there may have been observable side effects to those calls. This would be relevant, when copy-elision of objects like std::string are considered.)
Let's have a look. Here's a modified version that lets you choose a function on the command line:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
#include <iostream>
#include <string>

//just an example class
class MyClass
{
public:
    std::string toString()
    {
        return "just an example";
    }
};

static MyClass myObj;


bool example1()
{
    std::string str = myObj.toString();//executing copy constructor of std::string
    
    //... some code
    std::cout << str << std::endl;
    
    //.. some other code
    std::cout << str << std::endl;
    
    return true;
}

bool example2()
{
    const std::string& str = myObj.toString();//avoid copy constructor of std::string, but more verbose.
    
    //... some code
    std::cout << str << std::endl;
    
    //.. some other code
    std::cout << str << std::endl;
    
    return true;
}

bool example3()
{
    //avoid copy constructor of std::string, but we get two call stacks .
    
    //... some code
    std::cout << myObj.toString() << std::endl;
    
    //.. some other code
    std::cout << myObj.toString() << std::endl;
    
    return true;
}


int main(int argc, char **argv)
{
    bool (*func[3])() = { example1, example2, example3 };
    unsigned idx = atoi(argv[1]);
    if (idx > 2) {
	std::cout << "usage: " << argv[0] << " n\n"
	     << "where n is 1 2 or 3\n";
	return 1;
    }
    
    for (size_t i = 0; i < 1000000; i++)
    {
	func[idx]();
    }
}


And here are the results:

dhayden@DHAYDEN4WLGPF2 ~/tmp
$ time ./foo 0 > /dev/null

real    0m1.199s
user    0m0.904s
sys     0m0.280s

dhayden@DHAYDEN4WLGPF2 ~/tmp
$ time ./foo 1 > /dev/null

real    0m1.204s
user    0m0.919s
sys     0m0.280s

dhayden@DHAYDEN4WLGPF2 ~/tmp
$ time ./foo 2 > /dev/null

real    0m1.379s
user    0m1.075s
sys     0m0.296s

So it's example1 by a nose.

On the other hand, if you use '\n' instead of std::endl, the programs all run about twice as fast:
$ time ./foo 0 > /dev/null

real    0m0.693s
user    0m0.670s
sys     0m0.031s

dhayden@DHAYDEN4WLGPF2 ~/tmp
$ time ./foo 1 > /dev/null

real    0m0.701s
user    0m0.670s
sys     0m0.030s

dhayden@DHAYDEN4WLGPF2 ~/tmp
$ time ./foo 2 > /dev/null

real    0m0.889s
user    0m0.858s
sys     0m0.030s

The lesson here: the performance problem might not be where you think it is. Always measure. Never optimize until you're certain where the problem is.

It's good to know that the compiler is able to optimize many of these situations but do you suggest we should rely on this and simply store the return value as a new object even if the function returns a reference? Personally, I don't feel comfortable with this. If you pass the object by reference to an external function, that the compiler don't know the implementation of, the compiler would have to create a copy just so that the object can be passed to that function.
> do you suggest we should rely on this and simply store the return value
> as a new object even if the function returns a reference?

In general, I strongly favour value semantics. If possible, a function should return a value instead of a reference, and for a function which returns reference, the result should be stored as a value.

There are many (typical) exceptions to this approach; for instance a library written with high performance as an overriding design goal would leave the copying, if any, to be done by the client of the library.


> If you pass the object by reference to an external function, that the compiler don't know the implementation of,
> the compiler would have to create a copy just so that the object can be passed to that function.

Yes. Also, if the implementation of the function returning the reference is unknown.

For instance, in the earlier example, if we change the class to this:
1
2
3
4
5
6
7
8
9
10
11
12
struct A
{
    A() = default ;
    A( const A& that ) : data(that.data) { maybe_observable_behaviour() ; }
    // ...

    int data {} ;
    char filler[12] {} ;
    
    // function with unknown implementation: it may have observable side effects
    void maybe_observable_behaviour() ; // note: not const-qualified
};

https://godbolt.org/g/Tng8k6

Unless link-time optimisation/code generation is enabled.
Last edited on
Topic archived. No new replies allowed.