Why Pass String By Reference

Hello. I learned that when you pass something by reference not by value, the program doesn't make another extra copy which makes my program efficient and, whatever changes that I make get directly applied to the original.

But My question is why do people, most of the time, pass std::string by reference? while they pass other types of data by value. If the reason is for efficiency, why not pass all types of data by reference ??

Thank you
Simple types such as int are cheap to copy. Passing an int by reference could be less efficient because, instead of accessing the value directly, the function has to follow the reference to find the value.

std::string can be costly to copy, especially long strings, because it has to copy all the characters.

Sometimes it's hard to know what is optimal but as rule of thumb I would say, pass primitive types (e.g. integers, floats, pointers, enums) by value, and pass class types by const reference.
Last edited on
Hello pizza,

I may not be the best person to answer this, but I will try.

I tend to agree with what Peter87 has said. Simple variables are only a couple of bytes long and are easy to copy.

A "std::string" is not a simple variable it is a class with many member functions. Making a copy of all this is much more involved. So accessing the variable by reference is easier than trying to copy everything.

Many times I find that when a variable needs to be changed in a function it is easier to pass by reference and be able to make changes than it is to figure how to return more than one item.

Just a thought.

Hope that helps,

Andy
Passing an int by reference could be less efficient because, instead of accessing the value directly, the function has to follow the reference to find the value.
Actually, I would have thought that compilers would be able to automatically change const int & to int (since there's practically no difference), but apparently that's not the case:
https://godbolt.org/g/EYnUYb
helios, yes that would have been great. Unfortunately it seems like compilers treat functions as optimization boundaries. They do a lot of optimizations inside functions but not so much between functions (unless the function is inlined).

I have been thinking a little about this "problem" before and the "solution" that I came up with was to wrap the value type of the in parameters and use templates to try and decide the best type to use.

This is what a function would look like.

1
2
3
4
void foo(in<int> x)
{
	std::cout << x << std::endl;
}

The way it works is that I have a function that tells me if a type is efficient to copy or not. The criteria that I used was that the type must be small and be trivially copyable.

1
2
3
4
5
6
template<typename T> 
constexpr bool is_efficient_to_copy()
{
	return std::is_trivially_copyable_v<T> && 
	       sizeof(T) <= 2 * sizeof(std::size_t);
}

Then I just define a template struct to hold the optimal in-parameter type for each type. The default is pass-by-reference. Types that are considered more "efficient to copy" are passed by value.

1
2
3
4
5
6
7
8
9
10
11
template<typename T, typename Enable = void> 
struct In
{
	using type = const T&;
};

template<typename T> 
struct In<T, std::enable_if_t<is_efficient_to_copy<T>()>>
{ 
    using type = const T;
};

Note that I used const even for pass-by-value because I didn't want code to break if, for some reason, a type changes from pass-by-value to pass-by-reference.

In order to make it less verbose and easier to use I make use of an alias template.

1
2
template<typename T> 
using in = typename In<T>::type;

This seems to work fine but I haven't done any benchmarks yet to see if it's worth it. I'm not at all sure about the size limit in is_efficient_to_copy() but at least it is easy to change. When a program has been written this way it could be benchmarked with different values to see what performs best.

It is also quite easy to specialize for individual types. Maybe this could be automated using a tool that runs a benchmark for each type T that it finds as "in<T>" in the code and then writes all the optimized specializations to a code file that is included in the header file.

For consistency my plan was to have similar syntax for out parameters and in-out parameters as well, and perhaps also something for detecting the optimal type when you want to move from the parameter. This would also work as documentation of the purpose of the parameters. Maybe I try to use something like this for a small toy project in the future, not that I do think it will make a huge noticeable difference, but it could maybe be fun and I might learn something from it. In the meantime I'm just hoping compilers get better at optimizing these things.
Doesn't seem like there would be much use for that unfortunately, except for parameters of unknown type or of variable memory layout (i.e. dependent on compile-time configuration).

As for the size threshold, I think anything larger than a size_t will probably be passed by memcpy()[1]. In the particular case of x86-64 calling conventions, I can't remember at the moment if structs can be decomposed into multiple registers. Wikipedia just says
The first six integer or pointer arguments are passed in registers RDI, RSI, RDX, RCX, R8, R9
Even if we assume that they can be, whether the passing will be cheap depends on what other parameters the function is accepting.


[1] Obviously there are no guarantees of this. It's just an approximation.
Last edited on
The advantage as I see it is that the methods used to pass parameters are not hardcoded in the functions and if we want to change the way a certain type is passed we don't need to change existing code. In theory it should allow us to use different methods on different compilers and platforms depending on what is most optimal.

CppCoreGuidelines wrote:
What is "cheap to copy" depends on the machine architecture, but two or three words (doubles, pointers, references) are usually best passed by value.
https://github.com/isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md#f16-for-in-parameters-pass-cheaply-copied-types-by-value-and-others-by-reference-to-const
Last edited on
Unless the overhead of copying is known to be unacceptable, favour pass by value.

In particular, be wary of things like this: void foo( const string& a, std::string& b ) ;
Sorry, let me interrupt you guys, I have a quick question.
I know it must depend on the platform, but by convention, which is the better choice for char in terms of efficiency? by reference or value
Last edited on
Generally speaking all built-in types should be passed by value if possible. That is, all integer types (char is an integer type), float, double, all pointers, and all enums.
Lets say that I have a function that takes a const reference. I do call that function with one of my objects. At the same time (while the function is doing its thing) an another thread is modifying the very same object.

The function does obviously make a solid promise to not change the referenced object, but does it also assume that the object remains same for the duration of the function?
No, it does not assume another function can't modify it. It's still a race condition. Does this illustrate what you're saying, or do I misunderstand?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
// Example program
#include <iostream>
#include <string>
#include <chrono>
#include <thread>
#include <mutex>

struct Apple {
    Apple(Apple const &) = delete;             // delete the copy constructor
    void operator=(Apple const &) = delete;  // delete the copy-assignment operator

    double weight;  
};

std::mutex mtx;

void task(const Apple& apple)
{
    using namespace std::chrono_literals;
    
    for (int i = 0; i < 10; i++)
    {
        std::this_thread::sleep_for(1s);
        mtx.lock();
        std::cout << apple.weight << std::endl;
        mtx.unlock();
    }
}

int main()
{
    Apple apple {12345.0};
    
    std::thread t(task, std::cref(apple));

    for (int i = 0; i < 1000; i++)
    {
        std::this_thread::sleep_for(std::chrono::milliseconds(5));
        mtx.lock();
        apple.weight -= 1.0;
        mtx.unlock();
        
    }
    
    t.join();
}
Last edited on
Even without threads, the function may not be able to assume that the object pointed to by a reference-to-constant will not change:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
int global = 0;

void bar(){
    global++;
}

void foo(const int &a){
    std::cout << a << std::endl;
    bar();
    std::cout << a << std::endl;
}

int main(){
    foo(global);
}
Topic archived. No new replies allowed.