| Disch (8615) | ||
I was going to argue this and claim it was a classic example of undefined behavior (modifying the same variable twice in a sequence point). However, assuming the iterator is a complex type and not a typedef of a built in type, then you're right. This is legal and "safe". Though I still cringe. | ||
|
|
||
| L B (3806) | |||
So this is undefined?
| |||
|
|
|||
| Disch (8615) | |
|
No that is perfectly fine. As cire said... it's probably fine with map (I think -- at least I can see a reason why it wouldn't be now that I've thought about it more). But it might not work on all containers so I still don't recommend doing it. | |
|
|
|
| cire (2347) | ||
I'm not sure why it wouldn't also be safe if the assumption is false. side effects for arguments are guaranteed to be sequenced before the function body is entered, and in any case the function would still be operating on a copy. erase takes iterators by value. | ||
|
|
||
| Disch (8615) | ||
Yeah you're right. I can see it failing to work as expected for a container like deque or vector, but not because of the undefined behavior I was thinking of. I stand corrected. | ||
|
|
||
| L B (3806) | |
| Still, I'll get out of that habit because it is inconsistent among containers that use iterators. | |
|
|
|
| BlackSheep (420) | |||
|
It's undefined because an erase() on a vector causes it to move everything that comes after the erased element. Erasing a map's element doesn't have this effect because a map uses a binary tree as storage instead of an underlying array. So (Disch, you'll probably hate me) something like this should still be "valid":
| |||
|
|
|||
| rapidcoder (736) | ||||||
Yes, but it is easy to reorder code in such a way you change prefix to postfix without changing semantics. This can be often achieved by e.g. adding 1 somewhere or changing a condition from > to >=, etc. But compilers aren't *that* smart yet. e.g often you see code like this:
It can be simply changed into postfix:
which avoids pipeline stall and *is* faster. | ||||||
|
Last edited on
|
||||||
| L B (3806) | |||
@rapidcoder shouldn't your < and <= be swapped?
| |||
|
|
|||
| BHXSpecter (982) | |
| L B, yeah, seems that way, but you are assuming they are initializing x and y to 0 while they could be doing a mathematical figure or test where they could be negative (which would make both true). Guess it should be noted that his example is case based. | |
|
Last edited on
|
|
| L B (3806) | |||
It is completely not case-based. He used 'a' in both cases, saying they were equivalent result-wise. The fact that x and y are 0 does not matter, I just made them the same value because otherwise the example would make no sense.
| |||
|
Last edited on
|
|||
| JLBorges (1754) | |||
|
> which avoids pipeline stall and *is* faster. Don't spout rubbish about things that you don't understand. Pipeline stall, indeed. With the logical error corrected (not that that makes a difference as far as performance is concerned):
| |||
|
|
|||
| rapidcoder (736) | |||||
|
In the second code cmpl and leal can be executed in parallel by the CPU on different pipelines. In the first code, you've got read (cmpl) immediately after write (addl), and cmpl will have to wait for the addl to finish., which is... pipeline stall. Which means that the only those two addl and cmpl in the first example can take 10-20 CPU cycles. Judging performance by counting opcodes is indeed, rubbish, on your side. BTW: Little offtopic. Which is faster an why:
Which one is faster, excluding array initialization. I'm asking only about the last for loop. Both are *identical* and yield *identical machine code*. | |||||
|
Last edited on
|
|||||
| Cubbi (1925) | ||
|
just for fun, I compiled a couple programs: one calls foo() (that is, return ++a <= some_const ;), the other calls bar() (that is, return a++ < some_const). One billion times each. After dumbing down the compiler options I usually use so that they don't do whole-program optimizations, the results were: average of 5 runs
Assembly on gcc/x86 looked just like in the post above (Intel used more cpu registers) I'd say practice shows that there is no difference. | ||
|
|
||
| JLBorges (1754) | |
|
> In the second code cmpl and leal can be executed in parallel by the CPU on different pipelines. Right. Because the second code requires the use of two registers instead of one. Even assuming that these functions are not inlined, with Link time optimization (which among other things perform interprocedural register allocation), this would need two extra instructions to save and restore the register around the call. If a++ < some_const was inline, LTO is not involved; but the register would still have to be released. Push edx would stall waiting for push eax to finish, even before the section of code was entered into. Unless the program is embarassingly tiny, and one is running on a machine with oodles of registers, the code requiring the use of an extra register would be slower. That is theory; in reality, since int is a basic type, every self respecting compiler would know how to generate code that runs with the same efficiency for both versions. EDIT: Thanks, Cubbi. Hadn't seen your post when I wrote the above, but your post proves the last point. Though, I suppose that it was something that a C++ programmer would already know; in fact almost every C++ programmer who had posted in this thread mentioned something to that effect. EDIT2: Hadn't seen this addendum either. > Judging performance by counting opcodes is indeed, rubbish, on your side Judging performance taking into account register usage; no count of op-codes were ever mentioned. If my memory serves me right, with an option like -march=corei7, the count of opcodes would be the same for both versions; but the code generated for the postfix increment would still use that crucial extra register. | |
|
Last edited on
|
|
| L B (3806) | |||
http://stackoverflow.com/questions/11227809/why-is-processing-a-sorted-array-faster-than-an-unsorted-array | |||
|
|
|||
| Cubbi (1925) | ||||
PS: assembly, because I was curious, others might be as well
| ||||
|
Last edited on
|
||||
| rapidcoder (736) | |
|
@JLBorges Judging performance by number of used registers is also wrong. As you can see, both codes execute equally fast. If machine code uses only one register, it doesn't mean the CPU will use only one register. CPUs do internal register renaming, in order to avoid pipeline stalls. Internally x86 has 76 or more registers, even in 32-bit mode. You have completely no control over it, even when coding in assembly (funny times that even assembly coders do not have full control over hardware). So for simple code, I'm not very surprised, they executed equally, because CPUs are getting smarter and smarter. There are also some bypasses so the preceding instruction need not to execute fully in order to start executing the next dependent instruction. I recall on a different forum they did an experiment with complex expressions where lots of post/pre increments were used and preincrements turned out to be slower. But that was a few years ago on a different hardware. The example with branch prediction shows that the times when you could safely guess "instruction A is always slower than instruction B" are over. Now, everything depends on the context. | |
|
Last edited on
|
|
| JLBorges (1754) | ||
|
> http://stackoverflow.com/questions/11227809/why-is-processing-a-sorted-array-faster-than-an-unsorted-array Thanks, LB, for that link. This was illuminating, particularly the bit about ICC performing a loop-interchange.
| ||
|
|
||
| JLBorges (1754) | |
|
> Judging performance by number of used registers is also wrong. As you can see, both codes execute equally fast. ... Yes, you are absolutely right on that count. That for native types, both would execute equally fast was the commonly held view among the C++ programmers who posted in this thread. And then you came up with the categorical assertion that postfix increment 'avoids pipeline stall and *is* faster.' Which is what started this debate. > The example with branch prediction shows that the times when > you could safely guess "instruction A is always slower than instruction B" > are over. Now, everything depends on the context. AFAIK, those days were over a very long time ago, ever since pipe-lining, cpu caches, translation look aside buffers etc. made their appearance. For example, John Lakos in his 1996 book, talks about it at some length while discussing the performance implications of inline functions. | |
|
|
|