stl vector iterator windows linux

Pages: 12
Hi,

I encountered a pice of code that, to my surprise, behaves differently in Windows and in Linux and I would like to know your opinion on it.

Thanks!

See the very short program I have used.

#include <iostream>
#include <vector>

using namespace std;

int main()
{
vector<unsigned char> myVector;
vector<unsigned char>::iterator myIter;
unsigned char firstByte = 0x01;
unsigned char secondByte = 0x02;

myVector.resize(3);

myIter = myVector.begin();

*myIter = firstByte << 4;

*myIter++ = *myIter | secondByte;

cout << "Result 0x" << hex << (unsigned int) myVector[0] << endl;

/**
The result I get in Linux g++ is 0x2
If I remove the ++ from *myIter then I get: 0x12

In Windows Visual Studio I get 0x12

What should be the correct result?
**/

}
IMO any multiple operator expression which involves the post-increment or post-decrement operators should be treated with suspicion and considered undefined

What version of g++? On g++ 4.0.1 on mac, I get 0x12, which is what I would expect. Some expressions involving pre or post increment and decrement are undefined, so obviously, you should avoid using variables with ++/-- operators in compound expressions. In this case, it seems like it should be defined but is still obviously confusing. The result of myIter++ should me myVector[0]. It should only be myVector[1] if it is evaluated a second time.
In a function call like
 
f(g(..), h(..))
the call sequence can be expanded to
1
2
3
.. temp1 = g(..);
.. temp2 = h(..);
f(temp1, temp2);
or to
1
2
3
.. temp2 = h(..);
.. temp1 = g(..);
f(temp1, temp2);
This even applies to more deeply nested cases, such as
 
f1(f2(f3(..)), f4(..));
, where f3 may be evaluated first, f4 second, f2 third. Operator expressions, built-in or user defined, have the same non-deterministic order of evaluation. So, in your expression
 
*myIter++ = *myIter | secondByte;
myIter++ can be evaluated before evaluating the right side of the assignment or after it.

In many cases the order can be unimportant. You may be certain that the side effects do not alter the result from other sub-expressions. There is one particularly prone case that my teachers used for illustration. Long chain of stream output with << or input with >>. There is something deceiving in their syntax.

Hummm, thanks for the post.
I haven't tested this stuff recently, but I think most current compilers follow precedence and associativity rules very closely. This is why I wanted to know what version of g++ the OP was using. Such order dependent expressions should still be avoided since you my run into a compiler that handles them differently. IMO, the comma operator should not imply an undefined order since it should follow the associativity rules in the order of evaluation. I would avoid code that depends on that order though.
I think that precedence and associativity only dictate how the infix expressions will be expanded into functional notation.
That is
 
a op1 b op2 c op3 d
may be interpreted as
 
op2(op1(a,b),op3(c,d))
according to the precedence and associativity rules, but this does not mean that op1 is evaluated first and op3 second. The only certain thing is that op2 will be evaluated last in this case.

Regards

EDIT: For example
1
2
int i = 1;
cout << i << ++i;
may print 12 or 22 depending on the order of evaluation, which is unspecified.
Last edited on
The version of the g++
g++ (Ubuntu/Linaro 4.4.4-14ubuntu5) 4.4.5

I often use this kind of expressions:

1
2
3
4
unsigned char *pointer;
unsigned char  myByte;

*pointer++ = *pointer | myByte;


And I though I could just replace the pointer by an iterator and the result wold be the same. Now I see I was wrong.

Thanks for all the answers.

For pre and post-increment operators I always make it a point to be just a statement on it's own line and not part of a complex expression to avoid such hard-to-understand behavoir.
For absolute portability, I believe that you can not depend on the sequencing of the side-effects for the built-in operators. The only guarantee is that the side-effects will be committed before the next "sequence point" (like semicolon, comma, logical connectives). The side-effects may be committed immediately after the operation that introduces them, somewhere in-between the other operations, or right before the following sequence point. Even when the side effects are committed at the following sequence point, the order is still arbitrary.

Probably this allows the compilers to use better optimizations. I think it is not just compiler dependent issue. I think it may also be settings dependent, and in the highest optimization setting it could also be context sensitive and expression specific.

Regards

I think that precedence and associativity only dictate how the infix expressions will be expanded into functional notation.
That is

a op1 b op2 c op3 d
may be interpreted as

op2(op1(a,b),op3(c,d))
according to the precedence and associativity rules, but this does not mean that op1 is evaluated first and op3 second. The only certain thing is that op2 will be evaluated last in this case.


A function argument list is different from the actual comma operator, which has defined precedence and associativity. EDIT: thought you were using comma as one of the operators.
For a function argument list, I don't think the order of evaluation or associativity is defined or meaningful; comma is not an operator in an argument list. Calling a function with expressions that modify values can result in undefined behavior. Examples:
1
2
3
int x = 1;
some_function(x+1, x/2);  // fine, doesn't modify x anywhere
some_function(++x, x/2);  // bad; you don't know whether (x/2) will get 1 or 2 for x 

Even if you run it and check for your compiler, it my change with different compiler options and versions.


int i = 1;
cout << i << ++i;
may print 12 or 22 depending on the order of evaluation, which is unspecified.


In the above case, I would think that the order of evaluation would be specified since each operator<< is defined as a separate function call.
 
cout.operator<<(i).operator<<(++i);

cout.operator<<(i) returns an ostream reference, which is then used to call operator<<(++i). Therefore, the original value of "i" should always be output first before the second function call has its arguments evaluated; it is dependent on the return value of the first as an argument. It is still probably a bad idea to depend on that though; while they are separate function calls, they are still part of the same "expression" and the compiler may resolve/evaluate the function arguments independent of precedence and associativity rules.




The version of the g++
g++ (Ubuntu/Linaro 4.4.4-14ubuntu5) 4.4.5

I often use this kind of expressions:
1
2
3
4
5
unsigned char *pointer;
unsigned char  myByte;

*pointer++ = *pointer | myByte;

And I though I could just replace the pointer by an iterator and the result wold be the same. Now I see I was wrong.


This type of code is unsafe regardless of whether it is an iterator or pointer. If you want "pointer" incremented after the assignment, you should just use pre-increment on a separate line (separate semi-colon delimited expression). Also, there is some possibility that it will work in some compiler modes, but not all. The optimizer could decide to resolve one part of the expression early, which seems to be what happens with your version of g++ with iterators. Vector iterators generally are just basic pointers though, so it is an interesting case.

Sometimes I think the post-increment/decrement operators just should not exist. The post-increment operator is particularly problematic for iterators (and other classes) that are not simple pointers. If you tried to assign to the iterator itself, rather than what the iterator points to, you could assign to a temporary object. Post-increment operators on iterators have to make a copy of themselves (since it needs to return the un-incremented value), increment themselves, then return the copy. If you assign to the result of such an operator, you are assigning to the copy that goes out of scope at the end of the expression. For the above code, you are dereferencing it, so you don't care if you assign through a copy of the pointer or the original pointer.

Last edited on

I think that precedence and associativity only dictate how the infix expressions will be expanded into functional notation.
That is

a op1 b op2 c op3 d
may be interpreted as

op2(op1(a,b),op3(c,d))
according to the precedence and associativity rules, but this does not mean that op1 is evaluated first and op3 second. The only certain thing is that op2 will be evaluated last in this case.


I am not really disagreeing with your post, I am just trying to clarify (hopefully). For the example you have with a, b, c, d, op1, op2, and op3, it would be nice if the compiler would take the associativity into account when deciding to evaluate op1 or op3 first. It doesn't necessarily do so though, and it usually isn't important. I generally do not embed ++/-- in other expressions, so this does not come up. If you do, even though it is not a good idea, it is definitely a bad idea to use the same variable more than once.

Referring to my post above, it looks like the compiler will not let you do a++ = b; anyway; invalid lvalue. I don't know if that extends to user defined types without testing though. I have had people do things like const char* p = obj.getString().c_str() where getString() returns a string by value. The pointer returned from c_str() is owned by the string, and it is freed as soon as the string goes out of scope, which in this case, is at the end of the expression. "p" then points to already freed memory. Compiler might be able to check with operator++(int) though, since it returns a temporary directly. Anyway, you should almost always be using pre-increment or decrement operators. They are often used interchangeably, often where the extra temporary introduced by post operators is not required. The compiler can often eliminate temporaries, but it is better not to introduce them in the first place.

I am suffering from some insomnia again, so hopefully I am making sense...
A function argument list is different from the actual comma operator, which has defined precedence and associativity. For a function argument list, I don't think the order of evaluation or associativity is defined or meaningful; comma is not an operator in an argument list. Calling a function with expressions that modify values can result in undefined behavior.

Probably I am missing the point, but I seem to agree completely on all of this (including the part regarding the comma operator). Indeed, the order of evaluation of the arguments to a function call is not defined. And the comma in the argument list is not an operator.

What associativity demands is that the expression should be evaluated like this:
 
( cout << i ) << (++i);
, to which we are accustomed, and not like this:
 
cout << ( i << (++i) );
, to which we don't have clear interpretation.
EDIT: actually, we do have a clear interpretation :)

How we evaluate the operands is still flexible. It can be done like this:
1
2
3
temp1 = i;
temp2 = ++i;
( cout << temp1 ) << temp2;
or like this:
1
2
3
temp2 = ++i;
temp1 = i;
( cout << temp1 ) << temp2;
, which arguably produces different result. The associativity rule is the same for both cases.

The arguments for each operation are not evaluated on-demand, which I suspect is the point of confusion. The argument values can be computed way ahead of time, much before the sub-expression that uses them has to be evaluated. The result can be preserved at temporary location and extracted from there, when it is needed. The flexibility allows the compiler to rearrange calculations between two sequence points (roughly in the confines of a single statement, but not exactly).

Also from this, it can be concluded that associativity rules are pointless for built-in associative operators like addition, which also have no side effects. It is my impression that the rule is useful only for their user overloads. For non-associative operators the situation is different, because the rules affect the result.

For built-in operators with side effects, I believe there is an additional catch. I am not completely sure on this point though. I think that the code
1
2
int i = 0;
int j = (i++) + (i++);
can assign to j both 0 and 1. It would have been more intuitive if j is always assigned 1. After all, one of the increments is executed first and the other second. And this is true, but the side effects can be deferred to an arbitrary moment before the next sequence point. For user overloads, this weirdness would have been spared, because the side effects are committed before the operator returns from the call and can not be deferred (so easily).

Regards
Last edited on
Probably I am missing the point, but I seem to agree completely on all of this (including the part regarding the comma operator). Indeed, the order of evaluation of the arguments to a function call is not defined. And the comma in the argument list is not an operator.


That is why I posted twice in a row there.


What associativity demands is that the expression should be evaluated like this:

( cout << i ) << (++i);
, to which we are accustomed, and not like this:

cout << ( i << (++i) );
, to which we don't have clear interpretation.
EDIT: actually, we do have a clear interpretation :)

How we evaluate the operands is still flexible. It can be done like this:

1
2
3
temp1 = i;
temp2 = ++i;
( cout << temp1 ) << temp2;

or like this:

1
2
3
temp2 = ++i;
temp1 = i;
( cout << temp1 ) << temp2;

, which arguably produces different result. The associativity rule is the same for both cases.


I understand this, and was trying to use this as an example also. I would prefer that the arguments/operands be resolved/evaluated in line with precedence and associativity rules, but I know that this is not the case.

The kind of code we are talking about is generally irrelevant to me, since I would never write such code in the first place. This specific example is interesting though because operator<< is a binary operator, therefore one of the arguments to cout << ++i, (the ostream&) can not be resolved until it is returned from the first function. That is, I would expect it to execute something like this:

1
2
3
tempOstream& = cout << i;
i = i + 1;
tempOstream << i;


So the problem here is similar to what you were pointing out in that lines 1 and 2 can be interchanged, but it isn't quite the same. I may have to do some testing at some point; too tired right now. I would expect the order of evaluation to be arbitrary for general function argument list, but for operators, they already have to support very specific ordering for logical operators (a||b is expected to not evaluate b if a is true). Does this stricter ordering carry over to general binary operators I wonder.
I believe jimc is correct here:

std::cout<<i<<++i;

You can't evalulate the ++i first because it depends on the result of the first function (std::cout.operator<<(i)).
I don't understand.

std::cout<<i<<++i depends on the value of std::cout<<i and of ++i. The expression ++i certainly does not require you to evaluate the result from std::cout.operator<<(i). The result of some expression is only necessary for evaluating those that contain it as sub-expression. But the order of evaluating the sub-expressions in an expression is unspecified.

This post seems to answer the question:
http://stackoverflow.com/questions/996844/c-shift-operator-precedence-weirdness

I couldn't find where in the standard my claim is supported. On the other hand, according to the above post, Stroustrup has been explicit on the issue in his similarly normative book. And the standard can be a bit vague at times.

Regards

I believe jimc is correct here:

std::cout<<i<<++i;

You can't evalulate the ++i first because it depends on the result of the first function (std::cout.operator<<(i)).


No, I wasn't disagreeing with the idea that the evaluation of the arguments can occur out of order. I was only disagreeing with the statements involved, which has more to do with the functional decomposition. With bop = "binary operator" and uop = "unary operator":
"a bop b bop (uop c)" resolves to "(a bop b) bop (uop c)" in this case. I agree with Simeonz that it is arbitrary whether "a bop b" or "uop c" gets evaluated first. IMO, it should choose to go left to right, as this is what seems natural to humans with binary operators, and agrees with the associativity of most binary operators also. This is generally an unnecessary restriction though, and it may be difficult to implement in a real compiler. It is a necessary restriction where logical operators are concerned due to short circuit evaluation: "(a || b)" must evaluate "a" first. It is common in code to have such dependancies for logical operators.

What I was disagreeing with:

1
2
3
temp1 = i;
temp2 = ++i;
( cout << temp1 ) << temp2;


The above is not a proper functional decomposition. "cout << i << ++i;" resolves to:

1
2
3
tempOstream& = cout << i;
i = i + 1;
tempOstream << i;


The arguments to the first operator<< is "cout" and "i"; the arguments to the second call to operator<< is "tempOstream& " and "i". As I stated, in agreement with Simeonz, lines 1 and 2 can be swapped, resulting in:

1
2
3
i = i + 1;
tempOstream& = cout << i;
tempOstream << i;


if I may inject:
*pointer++ = *pointer | myByte;
For a pointer, p++ always == p. Therefore this will work as expected.
For an iterator, ++ is a function, which can return whatever it wants to return. I don't have a C or C++ standards document, but I believe that in *it++ =*it | mybyte;, the sides of the equals can be evaluated in either order. In my opinion, this represents a massive, gaping flaw in C++ operator overloading. The standard should have guaranteed that the postincrement function was called after the statement was evaluated.
*p++=*p|c;
Should always be equivalent to:
{*p=*p|c;p++;}
But this isn't how it is, so the C++ standard completely destroys a basic rule of the original C semantics.
Oh, how unfortunate.
jimc, I apologize if I sounded worked up. I guess I was rambling. This is a language that offers flexibility more than any language I know, but suffers from its sheer complexity. Lucky for it, I have a greedy mindset.

Regards

if I may inject:
*pointer++ = *pointer | myByte;
For a pointer, p++ always == p. Therefore this will work as expected.


Okay, so either you didn't understand what has already been posted, or you didn't read it (I know my post are long winded). The value of the right side of the equal sign is not in question, it is the value of pointer on the left side that is in question.

 
*a(1st)++ = *a(2nd) | something;

For the above, if you evaluate the left side first, the 2nd use of "a" gets a different value compared to when you evaluate the right side first. It is okay, but IMO, bad style to embed ++/-- operators in a more complicated expression. It is undefined if you use the same variable with an operator with side effects more than once in the same expression. It may work on your version of the compiler, with your compile options, but it is not guaranteed to work the same all of the time. The optimizer could evaluate the right hand side first, since it is an optimization to resolve where the result should go before you calculate the result. This is often done with return by value, where the function is automatically modified to construct the actual object (passed a pointer of where to put it) rather than make multiple temporaries. This is why it is unsafe to have constructors with side effects. You don't know how many temporaries will be created.



For an iterator, ++ is a function, which can return whatever it wants to return. I don't have a C or C++ standards document, but I believe that in *it++ =*it | mybyte;, the sides of the equals can be evaluated in either order. In my opinion, this represents a massive, gaping flaw in C++ operator overloading. The standard should have guaranteed that the postincrement function was called after the statement was evaluated.
*p++=*p|c;
Should always be equivalent to:
{*p=*p|c;p++;}
But this isn't how it is, so the C++ standard completely destroys a basic rule of the original C semantics.
Oh, how unfortunate.


It is unfortunate that you do not want to understand this. This has nothing to do with whether it is C++ or C, or what type of variable it is, pointer, iterator, int, or other. It has always been like this in both C and C++. Also, a vector iterator is just a pointer. Iterators are used to mimic pointers. They don't have to be pointers, and you should not depend on them being pointers, but they often are, so you are often just doing pointer arithmetic.

Bottom line, if I found any code using embedded operators with side effects, I would change the code. If I found the use of an operator with side effects on a variable used more than once in the expression, as the original example code does, I would consider it a bug that needs to be fixed.
1
2
3
4
5
6
7
8
#include "stdio.h"
int main(){
	int a[3]= {1,2,3};
	int *p=a;
	*p++ = *p + 1;
	*p = *p + 1;p++;
	return 0;
}

The compiled code is this, where my comments, with bolding show what is happening in each instruction:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
main:
	pushl	%ebp
	movl	%esp, %ebp	;int main(){
	subl	$16, %esp	;int a[3]={1,2,3};int *p=a;
	movl	$1, -16(%ebp)	;int a[3]={1,2,3};
	movl	$2, -12(%ebp)	;int a[3]={1,2,3};
	movl	$3, -8(%ebp)	;int a[3]={1,2,3};
	leal	-16(%ebp), %eax	;int *p=a;
	movl	%eax, -4(%ebp)	;int *p=a;
	movl	-4(%ebp), %eax	;*p++ = *p + 1;
	movl	(%eax), %eax	;*p++ = *p + 1;
	leal	1(%eax), %edx	;*p++ = *p + 1; (not sure why it does it this way)
	movl	-4(%ebp), %eax	;*p++ = *p + 1;
	movl	%edx, (%eax)	;*p++ = *p + 1;
	addl	$4, -4(%ebp)	;*p++ = *p + 1; (happens after the rest of the statement)
	movl	-4(%ebp), %eax	;*p = *p + 1;
	movl	(%eax), %eax	;*p = *p + 1;
	leal	1(%eax), %edx	;*p = *p + 1;
	movl	-4(%ebp), %eax	;*p = *p + 1;
	movl	%edx, (%eax)	;*p = *p + 1;
	addl	$4, -4(%ebp)	;p++;
	movl	$0, %eax	;return 0;
	leave			;return 0;
	ret

The code does exactly the same thing. Lines 10-15 are the same as lines 16-21.
EDIT: apparently bold doesn't work on symbols.
Last edited on
Pages: 12