Return By Reference

Forum

Forum
General C++ Programming
Return By Reference

Return By Reference

I know it's not a good idea to return by reference unless you're sure the reference you're returning will still be pointing at something valid. But look at this simple program:

#include <iostream>
using namespace std;

double &GetSomeData()
{
double h = 46.50;
double &hRef = h;
return hRef;
}

int main()
{
double nonRef = GetSomeData();
double &ref = GetSomeData();
cout << "nonRef: " << nonRef << endl;
cout << "ref: " << ref << endl;
return 0;
}

which prints out:

nonRef: 46.5
Ref: 2.12217e-313

So, I'm wondering if I'm getting lucky getting good data for the non-reference variable, or if this is indeed OK to do (on any compiler). I'm thinking this is what is going on in the (a) and (b) calls to GetSomeData() returning to a non-reference and reference, respectively:

1. The GetSomeData() function is called
2. h is created on the stack and assigned the value 46.50
3. hRef is created is set to point to h
4a. For the nonRef case, h, the thing pointed to by ref (i.e., h) in GetSomeData() is copied to nonRef in main(), in other words, an implicit conversion from double& to double takes place from hRef in the function to nonRef in main() -AND-
4b. For the ref case, the address (basically) of h, stored in hRef, is copied to ref in main(), in other words, no conversion takes place, and the value (an address) of hRef is copied to ref
5. The GetSomeData() function terminates, h and hRef both go out of scope

Since main()'s nonRef contains the actual data, a copy of h, it is OK. Since main()'s ref "points" ("refers") to the now defunct h, it is not OK.

Is this what is going on? Can I always depend on it, or is this compiler-specific behavior, and therefore indeterminate or undefined behavior?

Also, I'm not really clear on the order of steps 4a/b and 5. Are values copied to the receiving variable before the function terminates and the variables go out of scope? What are the exact mechanics of this?

helios (17607)

You're getting lucky. The behavior of GetSomeData() is undefined.

Jacko (12)

Can you tell me the mechanics of how a function returns? When you're passing in a variable into a function like this:

int SomeFunc(int a)
{
int b;
b = a * 2;
return b;
}

int main()
{
int x = 3;
int y;
y = SomeFunc(x);
return 0;
}

Is this the order:

1. main() starts
2. x is created on the stack and is set to "3"
3. y is created on the stack
4. SomeFunc() is called: the value of x, 3, is copied onto the stack, used as "a" in the function
5. b is created on the stack
6. b is set to a * 2 = 6
7. y is set to 6
8. the function terminates and a and b go out of scope
9. main() terminates

In particular, are steps (7) and (8) correct? Or does the function terminate with a and b going out of scope, and b is returned some other way? Not 100% sure on the return mechanics.

Last edited on

helios (17607)

IIRC, first b goes out of scope by being popped from the stack, and then its value is pushed onto the stack. Then the control flow goes back to main() and the stack is popped and the value assigned to y. I think that's how it worked, but I'm not entirely sure.

EDIT: Even if this is true, though, it doesn't really help you. The above description doesn't apply, for instance, if there's a copy constructor call involved. You should only imagine values being copied around, not the stack being pushed or popped.

Last edited on

Jacko (12)

Is it possible that when main() calls SomeFunc(), it passes -- along with the address of where to return and the passed-in parameters -- the address of y to SomeFunc()'s stack frame? So then, the return statement in SomeFunc() stores the value of b in y, using the pointer, at the "return" statement right before it terminates and the stack frame becomes toast? In other words, passing IN to the function is a no-brainer: it creates a stack frame and copies all the pertinent data to it. Passing data BACK is a bit more problematic, as somehow it has to get back to the caller before the callee's stack frame and all its data, including the return value, vanishes. So I'm guessing indirection is used, i.e., the address of the receiving variable in the caller function is passed to the callee function. That's all I can guess. Anyone? I got my degree in EE, not CS! I think that's part of my C++ difficulties...

helios (17607)

I'm more or less certain no pointer-passing takes place unless explicitly stated in the function signature.

Jacko (12)

I'm thinking of things going on behind the scenes, like the creation of the stack frame and so forth, when a function is called. The executable has to copy things when making a function call that are transparent to us, as C++ developers, such as the address of where the call was made so control can return to the caller once the callee is done, and copies of function arguments. That I can picture. An area of memory is set aside for the stack frame, and all this information is popped onto it by the caller. Once all the data necessary for the callee to operate is there, control is transferred to the callee and it operates on the contents of the stack frame, manipulating its contents, and also creating its own local data (and the return value, presumably) on that frame.

But somehow, the return data has to work its way back from the callee to the caller. And at some point the stack frame ceases to exist. So that data needs to be transferred back somehow before it ceases to exist.

The caller sending the address of the variable that will receive the return value, and the callee using this address to stuff the return value into that variable (again, behind the scenes, by the stack frame mechanics) is the only thing I can think of as a way to get the data back to the caller. It's just my speculation, theory, thoughts...

I'd like to better understand it, as I get questions on C++ interviews that have a lot to do with the inner workings, the mechanics of things. It's hard to find these answers!

guestgulkan (2942)

If you are really that keen:
http://www.agner.org/optimize/calling_conventions.pdf
http://www.codeproject.com/KB/cpp/calling_conventions_demystified.aspx

try googling for:

C/C++ calling convention
C/C++ naming convention

Microsoft Visual C++ is brilliant for this type of investigation/learning.
You can put breakpoints in you code in debug mode and open the assembly window.
This will show you the source code and for each statement it will show the assembly code.
It will show the function prologues and epilogues etc...

Last edited on

helios (17607)

It's hard to find these answers!

No, they're not. Just take a look at your compiler's output.

I'd like to better understand it

Fine, here you go. The calling procedure generated by VC++ without optimizations:

int f(int a){ 00411260 push ebp 00411261 mov ebp,esp 00411263 sub esp,40h 00411266 push ebx 00411267 push esi 00411268 push edi return a*2; 00411269 mov eax,dword ptr [a] 0041126C shl eax,1 } 0041126E pop edi 0041126F pop esi 00411270 pop ebx 00411271 mov esp,ebp 00411273 pop ebp 00411274 ret int main(){ 00411280 push ebp 00411281 mov ebp,esp 00411283 sub esp,44h 00411286 push ebx 00411287 push esi 00411288 push edi int b=f(2); 00411289 push 2 0041128B call f (411096h) 00411290 add esp,4 00411293 mov dword ptr [b],eax return 0; 00411296 xor eax,eax } 00411298 pop edi 00411299 pop esi 0041129A pop ebx 0041129B mov esp,ebp 0041129D pop ebp 0041129E ret

In this case, the compiler chose to return the value through eax.
Let's try with something a bit bigger:

struct A{
	int a[50];
};

A f(){
	return A();
}

int main(){
	A a=f();
	return 0;
}

A f(){ 00411660 push ebp 00411661 mov ebp,esp 00411663 sub esp,108h 00411669 push ebx 0041166A push esi 0041166B push edi return A(); 0041166C push 0C8h //<-- 0xC8 is 50*sizeof(int) 00411671 push 0 //What to set the bytes to. 00411673 lea eax,[ebp-108h] //<-| 00411679 push eax //<--- address of the buffer 0041167A call @ILT+295(_memset) (41112Ch) //memset(ebp-108,0,50*sizeof(int)); 0041167F add esp,0Ch 00411682 mov ecx,32h 00411687 lea esi,[ebp-108h] 0041168D mov edi,dword ptr [ebp+8] 00411690 rep movs dword ptr es:[edi],dword ptr [esi] //This is a buffer copy, IINM 00411692 mov eax,dword ptr [ebp+8] } 00411695 pop edi 00411696 pop esi 00411697 pop ebx 00411698 mov esp,ebp 0041169A pop ebp 0041169B ret int main(){ 004116A0 push ebp 004116A1 mov ebp,esp 004116A3 sub esp,298h 004116A9 push ebx 004116AA push esi 004116AB push edi A a=f(); 004116AC lea eax,[ebp-1D0h] 004116B2 push eax 004116B3 call f (411131h) 004116B8 add esp,4 004116BB mov ecx,32h 004116C0 mov esi,eax 004116C2 lea edi,[ebp-298h] 004116C8 rep movs dword ptr es:[edi],dword ptr [esi] 004116CA mov ecx,32h 004116CF lea esi,[ebp-298h] 004116D5 lea edi,[a] 004116DB rep movs dword ptr es:[edi],dword ptr [esi] return 0; 004116DD xor eax,eax

If I'm getting it right, the compiler first copies the structure to an intermediate location, it returns the address of this location through eax, then copies it back to the destination structure. I'm not sure why it has to perform the copy three times.

Last edited on

Topic archived. No new replies allowed.