Tutorials on Memory are Lacking

Pages: 123

But how does the calling convention affect whether the value in the inner scope will maintain its value after returning?

Computergeek01 (5613)

OP is objecting to "undefined" behavior. The behavior is defined it's just not defined in the C++ standard. The tutorial isn't lacking, OP is just looking for the reason this behavior occurs in the wrong place. That's basically the point I was trying to make here (in fact I wrong about this not being an issue in those two instances since EAX would still be overwritten).

tl;dr: My stance is that the inclusion of behavior not defined in the C++ standard in a tutorial about C++ should be considered discretionary.

Last edited on

htirwin (1208)

I have to admit, after looking over the tutorial, I do agree it is lacking. Even though the term local variable is used a bit prior, it's not until here, http://www.cplusplus.com/doc/tutorial/namespaces/, that the reader is given any information about what local variables are, and this only talks about name visibility, and that's all the reader gets. The only mention they seam to get of object lifetime is here, http://www.cplusplus.com/doc/tutorial/classes2/, and the only hint they get is in this line,

The destructor for an object is called at the end of its lifetime; in the case of foo and bar this happens at the end of function main.

Last edited on

LB (13399)

Computergeek01 wrote:
The behavior is defined it's just not defined in the C++ standard.

When we say "undefined behavior" we are explicitly using the term as described in the C/C++ standard.

MiiNiPaa (8886)

The behavior is defined it's just not defined in the C++ standard.

I am actually lost here. What this defined behavior is? I believe compiler can return null pointer in this case and it would be compliant to the standard. (we cannot dereference received pointer anyway as it is UB, so no matter what we return, it will not change observable behavior, so we are allowed to return anything)

htirwin (1208)

in those two instances since EAX would still be overwritten

It wouldn't matter because the value of the pointer (memory address) is what is returned via putting it in EAX for the caller. The problem is that the variable that's associated with that address no longer exists.

Also you can't have a pointer to a register.

Last edited on

Computergeek01 (5613)

@ LB: I agree with you up to this point at least. I just take it a step further and say that behavior that is not defined as either allowed or forbidden in the C++ standard, should not be expected to have a mention the tutorial.

@ MiiNiPaa: The calling convention for the function that is being invoked defines what variables go where, and where the returned value is stored. As for the second part of your post, I can't account for the "As-If" rule if that is what you mean. I don't think that it would apply because not having a variable where you expect to find one certainly would change the observable behavior.

MiiNiPaa (8886)

You can legally do two thing with pointer to object: copy it, and dereference it. No matter what its value is it can be copied without problems. Trying to dereference invalid pointer is UB and anything could happens. Returning pointer to local value is obvious returning of invalid pointer so we cannot really dereference it. Therefore returning any pointer (okay, maybe not null one because we can expect address of variable to be not null) would be valid because you cannot check if it really points to our variable.

If there is two situations, one is normal and second will lead to UB, compiler can assume that second situation will never happens. So it can assume that you would not dereference that ponter and so why would it need to bother getting you real pointer at all?
so code could be optimised that way:

int* foo()
{
    int a = 42;
    return &a;
}

int* foo()
{
    return (int*)1
}

Some articles on UB dependent optimisations:
http://blogs.msdn.com/b/oldnewthing/archive/2014/06/27/10537746.aspx
http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html

Computergeek01 (5613)

I'll look over the articles, thank you.

I am not debating what happens when this particular code is compiled in C++. I can see what happens, and I can tell you that it will happen every single time you run it. I am telling you guys that what happens is defined, it is simply not defined by the C++ standard. Every-time you run OP's code in C++, the result will be the same because unless otherwise specified it will use the stdcall convention. Take that same code, swap out std::cout with printf() and run it in C or as cdecl and you will get a completely different result, but that result will occur the same way every time you run it. Neither instance will ever crash or fail to run. That is why we have calling conventions in the first place. You could sit there and say "Well we are only concerned with what's defined in C++", and if you want to hamstring your self like that then I can't stop you but you won't get very far with this language. It's ability to incorporate code from other languages like this is pretty large part of it's utility. This is not the same as using an uninitialized variable, this is simply you over-writing something if you're not paying close attention.

Maybe I'm going about this the wrong way. Maybe instead of fighting you on this I should instead suggest that we start explicitly defining the calling conventions of the functions that we use. Why not? Most of us do it with the std namespace, or we initialize variables even when we know that we have no intention of using them before they are assigned a value. It would just be one more point of clarity. That way the previously "Undefined Behavior" will become clearly and explicitly defined.

MiiNiPaa (8886)

Every-time you run OP's code in C++, the result will be the same

Yes. Because it is too simple and basically constant example.
But not every time you compile you would get same results. I got 0, got 8, got 3 (with some cheating).
If function got inlined you can forget about calling conventions. If function has internal linkage, you can forget about calling conventions. Optimisations would take place.

Last edited on

Computergeek01 (5613)

We've established that you can force 3 as a result, making it static like you did before is one way to do this because it changes how pointer get's returned. You received 0 by either compiling your program with a debugger that has some weird settings or by changing the variable type that was allocated in "bar()" to something that was not trivially destructible like an integer would be. How did I do?

EDIT: Your point about the optimizations is a valid one and it is one more reason for me to shift the tone of my posts to encouraging people to explicitly state the linkage spec.

Last edited on

helios (17506)

Every-time you run OP's code in C++, the result will be the same because unless otherwise specified it will use the stdcall convention.

The calling convention has nothing to do with this. Like MiiNiPaa said, a perfectly compliant compiler can generate a function that, following any convention you care to mention, just returns a null pointer. The behavior is "defined" only in the loosest sense of the word. You need to disassemble the generated executable and confirm that yes, the compiler did indeed generate a function that returns a pointer to somewhere higher on the call stack. Only then can you say anything about what the program will do.
This isn't even taking into account that the compiler might inline the function and not use any calling convention whatsoever.

As an aside, AFAIK x86 compilers generate cdecl functions for intra-module calls.

That is why we have calling conventions in the first place.

No. Calling conventions exist for interoperability.

You could sit there and say "Well we are only concerned with what's defined in C++", and if you want to hamstring your self like that then I can't stop you but you won't get very far with this language. It's ability to incorporate code from other languages like this is pretty large part of it's utility.

A language that generates code that requires callers to access values higher on the stack is... well, broken. Once the statement finishes the stack pointer will get decremented, and once that happens there's no guarantee that that portion of the stack hasn't been overwritten by an interrupt or something.

MiiNiPaa (8886)

8 - Release mode, no optimisations

0 - -03 -fexpensive_optimisations. After studying assembly I found out that compiler played out UB card and decided that today value of *i would be 0, eliminating all function calls, and variables.

3 - I made k in bar() long double (and changed its value to make sure that there would not be 3 anywhere as 8 is 1*2^3). No optimisations. I got lucky and probably aliasing took place so original value was still here.

How did I do?

F

I think, this establishes that you cannot hope to rely on code which has UB when optimisations are enabled. Compiler is a cold blooded bastard which will crush your hopes and will probably do it in the most inconvinient way.

EDIT: Even with external linkage, link time optimisation could inline function and mess up all code relying on calling conventions in that specific place. Only things that are safe are function from dynamic libraries.

Last edited on

Computergeek01 (5613)

8: Expected behavior.

0 - -03: Optimizer inlines the function do to settings you provided. I don't know where you're going with this "Studying Assembly" stuff, you should be able to predict this behavior by reading the documentation on your compiler.

3: Long doubles (i.e. floating points) are allocated to completely different registers, in fact they are referred to as floating point registers. Why did you even bother testing this one? Or did you just throw it in there because you didn't think that I would understand why this happened?

Every. Result. Predictable.* If given the correct set of evidence.

EDIT: *Except for the optimizer. I am still trying to see if specifying the linkage disqualifies a function from the "As-If" rule.

EDIT 2: I'm not trying to encourage this behavior or say that this is the correct way to code anything. Every post I've made has been trying to explain that this behavior that the OP showed us is documented elsewhere, outside of C++ and dependent on the specific platform. He wanted to know why the behavior is not documented in the tutorial, my answer is because this behavior is not part of the language. I only got aggressive past that point because the responses after that were a kin to: "Well if you completely change the parameters of the situation then your prediction is wrong!". Which happens to be a pet peeve of mine.

EDIT 3: @ htirwin: I didn't respond at first to you because it would have derailed me. But since that has already happened and you don't except PM's, here it is:

The problem is that the variable that's associated with that address no longer exists.

This is how you should always think about it, but this is an over-simplification. In this specific case an integer is what is referred to as "trivially destructible" i.e. no destructor is called for it. I was trying to explain why the results are what we see.

Also you can't have a pointer to a register.

Where do you see anyone taking the address of a register? The equal sign is called the assignment operator, not the assign by address operator. What it does here is copy the data in EAX to the variable 'i'.

Last edited on

MiiNiPaa (8886)

Optimizer inlines the function do to settings you provided

Ok take this: http://ideone.com/Jo3IUE
I did not provide any settings here.

It does not inlines functions, it throws them away because they are essentually no-op.
OP code and

int main()
{
    syd::cout << 0;
}

resulted in identical executables.

I tried to add __attribute__((noinline)) to each function. Results: bar() thrown out completely — no call, no mention of 8 anywhere. In foo() int i = 3; got thrown out and function returns pointer to current stack head and program began to reliably output -1.

As i add to my previous post even functions with external linkage could be inlined.

Long doubles (i.e. floating points) are allocated to completely different registers

Registers does not matter here. It got allocated on stack. Pointer was referring to stack. No registers here. Compiler just included several bytes of padding for aliasing purposes.

Every. Result. Predictable.* If given the correct set of evidence.

If you agree to be bound to the specific platform, specific compiler, specific compiler version and specific compiler parameters.

Only thing where it is viable is embedded programming.

Yes there are calling conventions. Yes you can expect some things like how arguments are plased on stack/registers. However you can do that only with code which adheres to them. And only code which is requred to do so is which expect to interact with outside programs.

EDIT: I read your point. I understand now. Yes, answer to the OP would be that in his case compiler adheres to specific conventions and does not do any eliminations that is why this result could be expected or could happen.
I talked about that you should not use such assumptions because it is so weak that any change to the parameters could break it. Slight misunderstanding here.

EDIT 2: I cannot resist to mention that OP code could potentionally start NetHack. http://feross.org/gcc-ownage/

Last edited on

msknapp84 (15)

In order to explain why I'm wrong, you all had to discuss a million things that were not in the tutorial.

I will admit I'm a beginner here, and I can't really argue what will happen with different compiler settings, different architectures, and slightly modified versions of my code, but what I do know is this:

1. I read all of this site's tutorials on three separate occasions.
2. I'm not an idiot.
3. After reading the tutorial and trying everything, I struggled more than with any other computer language. I don't think it's just because C++ is more complex, I think it's because a lot of things were not explained to me.

I think there is a dramatic differences between our goals. My goal in reading this tutorial is to be able to USE the language later with a reasonable amount of effort. It seems to me that this site is overly concerned with not being wrong.

behavior that is not defined as either allowed or forbidden in the C++ standard, should not be expected to have a mention the tutorial.

I completely disagree, if behavior is undefined for a particular pattern, then using that pattern could produce code that is not portable. The fact that it's not portable means it should be avoided. Hence, it deserves mentioning.

I'm not saying that the tutorial needs to say "it works this way" or "that way", but I do think you need to mention undefined behavior if it is likely to impact the developer. If it's undefined behaviour, FINE, just say in the tutorial that it is undefined. Then give us a recommended way to do things.

The bottom line is that most people reading a tutorial want to be able to use the language when they are done, and if the tutorial is not helping them they will switch to something else. If you care about your business then that should concern you.

Last edited on

msknapp84 (15)

If only there was like, I don't know, some sort of contact form you could use to send messages directly to the admin instead of starting a thread for every problem you spot.

@Helios: I am not psychic, how should I know how you want to be contacted? This is a popular website, for all I know you get millions of emails every day and won't see mine.

helios (17506)

The link is at the bottom of every page on this site. I'm not the admin, by the way.

cire (8284)

Computergeek01 wrote:
EDIT 2: I'm not trying to encourage this behavior or say that this is the correct way to code anything. Every post I've made has been trying to explain that this behavior that the OP showed us is documented elsewhere, outside of C++ and dependent on the specific platform.

Any chance you can point to that documentation?

The behavior is defined by the C++ standard. It is defined as undefined which means that any behavior observed is not meaningful. One should not depend on it or attempt to document it. It is not meaningful.

dhayden (5795)

The behavior is defined it's just not defined in the C++ standard.

Nope.

Every-time you run OP's code in C++, the result will be the same

Nope again. Usually it's the same, but every once in a great while it is not.... at least on some systems.

You have to consider interrupts. The interrupt service routine (ISR) uses the stack also. So if an interrupt occurs between the time that the first function returns and when the second function is called, that old local variable might get blown away by data from the ISR This is why you can never rely on a value on the "wrong" side of the stack pointer.

Pages: 123