### Efficiency

The following code snippets should be a good representation of my situation.
Which of the following would be more efficient:

 ``12345`` ``````int sum=0; for(int i=0; i

versus:

 ``1234567`` ``````int sum=0; for(int i=0; i=1007000000) { sum-=1007000000; //LB's Optimisation } }``````

The 1007000000 is usually just some large number.
Last edited on
I don't actually think there's a difference, unless the compiler isn't very good at optimizing. A better solution would be to use the second with -= instead of %= but again I don't know whether this is better than just letting the compiler optimize it. Have you tried some benchmarks?
I think that the first one would be more efficient. Modulus is not an expensive operation. Why not see what the assembly looks like, and do a test using gprof?
 Why not see what the assembly looks like

Well I would imagine that the jump would cause a loss of cached instructions, however I think that a modulus is quite an expensive operation (from what I remember from my assembly book, it does a divide and leaves the modulus in edx, with the divide being much slower than an conditional branch).
Seeing the assembler doesn't always help. Why not time them? That's the only way to know.
 The 1007000000 is usually just some large number.

Then the two are not necessarily equivalent. For example, assuming 32-bit integers, suppose during one of the iterations when you reach line 4 that sum is 2^31. Then we have line 5 yielding

sum =2280967296 - 1007000000 = 1140483648

which is bigger than 1007000000, whereas the modulus will always be less than 1007000000.
Good point, I didn't think of that. It can be fixed by changing the if to a while loop but at this point you seriously have to consider which is more efficient.
@Alrededor int_from_calculation is always less than the modulo number
If you do time the alternatives, then you could time a ternary version too:
 ``12345`` ``````const Cap = 1007000... ... add = int_from_calc(); // assert( add < Cap ) room = Cap - sum; sum = ( add < room ) ? add + sum : add - room;``````
How would I time the programs?
It depends on the OS.
Windows
http://msdn.microsoft.com/en-us/library/windows/desktop/ms644904%28v=vs.85%29.aspx
http://msdn.microsoft.com/en-us/library/windows/desktop/ms644905%28v=vs.85%29.aspx

Can QueryPerformanceCounter at the start and stop timer positions. Divide the difference with the value obtained from QueryPerformanceFrequency.

I've written some detailed stuff with code, but I can't find it.
> Which of the following would be more efficient:

Depends on the values of `large_num`, the minimum, maximum and expected average of the value of `int_from_calculation`, and the value of the `divisor` (1007000000 in the example). This obviously assumes that the time taken to evaluate `int_from_calculation` does not dominate the time taken by the loop; if it does, it doesn't matter how you write the loop.

Profile your code if you are not sure; don't try guesswork. Performing tail-splitting based on the actual run-time value of the variables (to get the most optimal version executed) may be worthwhile if `large_num` is very large, the operation is time-critical and the code is executed extremely frequently.

For instance (assuming positive values for the sake of brevity):

 ``123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167`` ``````#include #include // non-const with external linkage, to disable constant folding extern int large_num ; // invariant: large_num > 0 extern int divisor ; // 1007000000 // invariant: divisor > 0 extern int ubound ; // upper bound on value of int_from_calculation // invariant: ubound > 0 // invariant: ubound < ( std::numeric_limits::max() - divisor ) // invariant: lower bound on value of int_from_calculation > 0 // optimal if expected average value of int_from_calculation is // comparable to std::numeric_limits::max() / 2 int foo() { int sum = 0 ; for( int i=0 ; i::max() / 2 int bar() // ( a mod c + b mod c ) == (a+b) mod c, verifirs that a+b does not result in UB { int sum = 0 ; const int maxv = std::numeric_limits::max() - ubound ; for( int i=0 ; i maxv ) sum %= divisor ; // or: if( sum > maxv ) sum -= divisor ; // %= is more expensive, -= will execute more often } return sum ; /* GCC 4.9 i386 with -std=c++11 -pedantic-errors -Wall -Wextra -fomit-frame-pointer -c -S __Z3barv: subl \$28, %esp movl \$0, 12(%esp) call __ZNSt14numeric_limitsIiE3maxEv movl _ubound, %edx subl %edx, %eax movl %eax, 4(%esp) movl \$0, 8(%esp) jmp L8 ; *** loop starts here *** L10: call _rand movl _ubound, %ecx cltd idivl %ecx movl %edx, %eax addl %eax, 12(%esp) movl 12(%esp), %eax cmpl 4(%esp), %eax ; *** comparison each time through the loop jle L9 ; *** division only if unavoidable movl _divisor, %ecx movl 12(%esp), %eax cltd idivl %ecx movl %edx, 12(%esp) L9: addl \$1, 8(%esp) L8: movl _large_num, %eax cmpl %eax, 8(%esp) jl L10 ; *** and ends here *** movl 12(%esp), %eax addl \$28, %esp ret */ } // optimal if the invariant: // std::numeric_limits::max() > (large_num*ubound) holds int baz() // ( a mod c + b mod c ) == (a+b) mod c, a+b does not result in UB { int sum = 0 ; for( int i=0 ; i
Last edited on
@kbw
Thank you for the links, and I have found an thread that explains the use of it sufficiently, so if you cannot find the code it is okay.
http://stackoverflow.com/questions/1739259/how-to-use-queryperformancecounter

@JLBorges
 This obviously assumes that the time taken to evaluate int_from_calculation does not dominate the time taken by the loop

For argument's sake, assume that the time is negligent.

 Profile your code if you are not sure; don't try guesswork

Would that be timing it and looking at the resulting asm code?

 tail-splitting

What is this? Google only gave me results of "split-tail" which I highly doubt is what you mean.

 assuming positive values for the sake of brevity

values will always be positive.

Also thank you for the code and asm. I will definitely need to brush up on a few asm instructions and then I will spend some time reading through it. Thank you again.

> Profile your code if you are not sure; don't try guesswork
>> Would that be timing it and looking at the resulting asm code?

Timing it.

> tail-splitting
>> What is this?

Split the tail of the function (containing the loop). Something like

 ``12345678910`` ``````int foobar() { examine the run-time values of the variables involved if( baz() would be correct and optimal ) return baz() ; else if( bar() would be optimal ) return bar() ; else return foo() ; }``````

> I will definitely need to brush up on a few asm instructions
> and then I will spend some time reading through it.

It is a good idea to gain some understanding of assembly code. But you don't really need to do that in this particular case. The assembly code was there just to illustrate that at compile-time, the compiler does not have all the information that is required to completely optimize certain kinds of code. Of course, profile-guided optimization would help, and you should use it (along with representative profiler data) when appropriate.

For the above code, it is adequate if you understand that:

a. You have to ensure that `sum += int_from_calculation ;` does not result in integer overflow
b. In `foo()`, the modulo operation is performed `large_num` times.
c. In `bar()`, a check (comparison of integers) is performed `large_num` times, but the expensive modulo operation is performed much less often.
c. In `baz()`, no check is made in the loop and the modulo operation is performed just once, at the end.
Subtraction is almost always faster than modulus. Modulus needs division, division *is* slow on most processors, much slower than addition/subtraction. Bah, I guess it's even slower than multiplication.

The interesting part of the equation is that if condition. The branch is easy to predict and if int_from_calculation is small, most of the time it won't be taken. Therefore most of its invocations are almost free (within one cycle). Even in case of misprediction, the speculative execution would still hide most of latency... Comparisons and conditional jumps are very cheap on modern processors, which can prefetch and execute *both* branches simultaneously on two parallel pipelines and just cut one off after the result of condition is known.

Therefore I guess the version with subtraction would be faster in almost all situations.
Last edited on
> Subtraction is almost always faster than modulus. Modulus needs division,
> division *is* slow on most processors, much slower than addition/subtraction.

Repeat: it depends on the run-time values of the variables involved.

For instance, if `divisor` is 1000000, `sum` is `std::numeric_limits<int>::max() - 2000000` and the upper bound of `int_from_calculation` is `std::numeric_limits<int>::max() - 1000000`, modulo divion would be much faster than brute repeated subtraction in a loop.

Repeat: Don't rely on guesswork. Get an estimate the typical run-time values, profile to determine the value of n at which n comparisons and subtractions are more expensive than a modulus operation, and write the code accordingly.
Topic archived. No new replies allowed.