Is this a legit optimization?


In the out() function below, which sums elements, you can see I have a switch case to do the summation in a single line, and in the default case (for any more than 4 elements) it uses a loop.

My question is: does omitting the loop for known sizes actually optimize the code?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
    inline double out(){
        
        switch(index){
            case 0:
            //    
            break;
            
            case 1:
            output = source[0].output;      
            break;
            
            case 2:
            output = source[0].output + source[1].output;
            break;
            
            case 3:
            output = source[0].output + source[1].output + source[2].output; 
            break;
            
            case 4:
            output = source[0].output + source[1].output + source[2].output + source[3].output; 
            break;
            
            default: //are the other cases really an optimization or is this dumb??
                output = 0;
                for (int i = 0; i < index; i++) {
                    output += source[i].output;
                }
            break;
        }
        return output;
    }

I suppose I should actually look at some generated assembly, but I thought I would ask
thanks!
Last edited on
Maybe, may not. Don't be the compiler; just code a loop. If you find that your program seems to have some speed issues, use a profiler, and fix the code that actually needs optimizing.
The optimizer will unroll loops to the extent that it is advantageous to do so (if we make the function available for inline substitution, at the call-site).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
#include <numeric>

// uncomment the static and in the code generated (last part) 
// we can see that the loop is unrolled for the first eight
// possible iterations of the loop  
// (the optimiser has added cases 5, 6, 7, 8)
static int foo( const int array[], std::size_t index )
{
    int sum = 0 ;
    switch(index)
    {
        case 4 : sum += array[3] ; [[fallthrough]]; 
        case 3 : sum += array[2] ; [[fallthrough]];
        case 2 : sum += array[1] ; [[fallthrough]];
        case 1 : sum += array[0] ; [[fallthrough]];
        case 0 : return sum ;
        
        default : return std::accumulate( array, array+index, 0 ) ;
    }
}

// uncomment the static and in the code generated (last part) 
// we can see that the loop is unrolled for the first eight
// possible iterations of the loop
static int bar( const int array[], std::size_t index )
{ return std::accumulate( array, array+index, 0 ) ; }

// generates identical (well, almost identical; the order of summation is different)
// code for these two functions
int foobaz( const int array[] ) { return foo( array, 4 ) ; }
int barbaz( const int array[] ) { return bar( array, 4 ) ; }


https://godbolt.org/g/28Mfx4
JLBorges: That's extremely helpful!

It looks like you got the assembly for me, and I agree it looks about the same.

I had kept the switch statement just in case, since I was a bit paranoid about performance in an audio application.

I remember something about loop unrolling from class but forgot all of it, but this seems like a good example. The function i'm using will be an inline member of a struct and I am using GCC, so I feel more confident after looking at the generated code in the demo that those optimizations will take place for me

Topic archived. No new replies allowed.