std::function vs functions pointer.

I'm storing a short function for later use. I'm curious, can I except better performance from a lambda with a std::function or a c function with a raw pointer?
You can expect performance to be comparable unless you're calling the function tens of thousands of times per second, in which case the raw pointer might be a millisecond or two faster. =P


Don't sweat the small stuff. My general philosophy is that speed isn't really an issue until it's an issue.


Or if it really does matter (like if you're doing something where performance is critical), do some benchmarks to see for yourself -- then you can appropriately weigh the tradeoffs between implementation ease and speed.
I actually would call these functions VERY often. I'm doing some audio processing stuff. The function would be called sampleRate/bufferSize times per second * 2 (because it's 2 functions). And literally every millisecond counts in real time dsp.
I don't know... unless you're shooting for a very low latency, doing real-time DSP processing isn't as strenuous as you might think. As long as your code can run faster than real time, it's "fast enough".

But like I say, if it's really a concern, the only real answer you'll get is "benchmark it and see for yourself". I don't predict much of a performance difference (if any), but you never know until you try it.
If the function is reasonably small, one would expect the lambda expression to clearly out-perform the raw pointer.

However, there is a quality of implementation issue (closure and std::function<>).
As always, measure it on the specific implementation that is to be used.
One measurement is worth more than a thousand opinions.

Typical:

foo.cpp
1
2
3
4
5
#include <functional>

int foo( int a ) { return a/2 + a%3 ; }

std::function< int(int) > wrapped_closure() { return []( int a ) { return a/2 + a%3 ; } ; }

http://coliru.stacked-crooked.com/a/e36a848c9464e0a6

main.cpp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
#include <iostream>
#include <ctime>
#include <functional>
#include <algorithm>
#include <numeric>

int foo( int a ) ;
std::function< int(int) > wrapped_closure() ;

template < std::size_t N > long long bar( const int (&srce)[N], int (&dest)[N], const std::function< int(int) >& fn )
{
    const auto start = std::clock() ;
    std::transform( srce, srce+N, dest, fn ) ;
    return std::clock() - start ;
}

int main()
{
    constexpr std::size_t N = 1024*1024*64 ;
    static int srce[N] {};
    static int dest[N] {};
    std::iota( srce, srce+N, 0 ) ;

    std::cout << "function pointer: " <<  bar( srce, dest, &foo ) << " processor clock ticks\n" ;
    std::cout << "  lambda (local): " <<  bar( srce, dest, []( int a ) { return a/2 + a%3 ; } ) << " processor clock ticks\n" ;
    std::cout << " lambda (extern): " <<  bar( srce, dest, wrapped_closure() ) << " processor clock ticks\n" ;
}

http://coliru.stacked-crooked.com/a/1ce54afb138c7663

ln -s /Archive2/e3/6a848c9464e0a6/main.cpp foo.cpp
clang++ -std=c++14 -stdlib=libc++ -O3 -Wall -Wextra -pedantic-errors main.cpp foo.cpp -lsupc++ && ./a.out 
echo ============== && g++ -std=c++14  -O3 -Wall -Wextra -pedantic-errors main.cpp foo.cpp && ./a.out
function pointer: 890000 processor clock ticks
  lambda (local): 430000 processor clock ticks
 lambda (extern): 430000 processor clock ticks
==============
function pointer: 890000 processor clock ticks
  lambda (local): 420000 processor clock ticks
 lambda (extern): 410000 processor clock ticks
Suspected some compiler shenanigans (had to look that word up) and ran the JLBorge's code but with the order of the operations reversed, and...well take a look:

ln -s /Archive2/e3/6a848c9464e0a6/main.cpp foo.cpp
clang++ -std=c++14 -stdlib=libc++ -O3 -Wall -Wextra -pedantic-errors main.cpp foo.cpp -lsupc++ && ./a.out 
echo ============== && g++ -std=c++14  -O3 -Wall -Wextra -pedantic-errors main.cpp foo.cpp && ./a.out
  lambda (local): 880000 processor clock ticks
 lambda (extern): 440000 processor clock ticks
function pointer: 460000 processor clock ticks
==============
  lambda (local): 890000 processor clock ticks
 lambda (extern): 420000 processor clock ticks
function pointer: 440000 processor clock ticks


http://coliru.stacked-crooked.com/a/3337713edd2afe68

One more time!
ln -s /Archive2/e3/6a848c9464e0a6/main.cpp foo.cpp
clang++ -std=c++14 -stdlib=libc++ -O3 -Wall -Wextra -pedantic-errors main.cpp foo.cpp -lsupc++ && ./a.out 
echo ============== && g++ -std=c++14  -O3 -Wall -Wextra -pedantic-errors main.cpp foo.cpp && ./a.out
 lambda (extern): 910000 processor clock ticks
  lambda (local): 430000 processor clock ticks
function pointer: 490000 processor clock ticks
==============
 lambda (extern): 890000 processor clock ticks
  lambda (local): 420000 processor clock ticks
function pointer: 430000 processor clock ticks


http://coliru.stacked-crooked.com/a/998e2be51c943928

In conclusion, I feel that the outputs show that it doesn't really matter which you choose to use, it all comes down to how much the compiler can optimize it and the cache performance
Last edited on
That is a great example JLBorges.


To help process what this information actually means:

I ran this on my machine as well but changed it to use chrono and print as microseconds (since that is IMO more useful to visualize):


function pointer: 527030 microseconds
  lambda (local): 318018 microseconds
 lambda (extern): 315018 microseconds


To clarify, this is calling the functions 67108864 times, and has a speed difference of about 213 milliseconds.


So if you call this function once for every sample (ie, 44100 times), it scales down to a speed difference of approx. 140 microseconds. That is... 0.14 milliseconds.

And you won't even be calling it every sample -- you'll be calling it every few hundred samples. So you can divide that even more.






These speeds are comparable, and the differences shown here are largely academic. You will not notice any performance difference in your program based on which method you use.




EDIT: Didn't see Smac's reply until now.

So do you suspect the compiler is optimizing out the call?
Last edited on
Thanks guys, I actually just timed some code and I'm getting the raw function pointer as 1-2% than an std::function but the local lambda function for me is about 5-7% faster. But I can't use that method because I need to store the function. But anyway, I already wrote the function using std::functions and function pointers are only 1-2% faster so I think I'm going to
keep using std::functions.
Last edited on
> So do you suspect the compiler is optimizing out the call?

The compiler was optimising out the zero initialisation of dest (It is pre-initialised to all zeroes).
So, The first time around, dest got a cache-miss.

With line 23 added (force dest into the processor cache):
function pointer: 500 millisecs
lambda (local): 420 millisecs
lambda (extern): 430 millisecs
lambda (extern): 410 millisecs
lambda (local): 410 millisecs
function pointer: 470 millisecs
==============
function pointer: 450 millisecs
lambda (local): 430 millisecs
lambda (extern): 430 millisecs
lambda (extern): 420 millisecs
lambda (local): 430 millisecs
function pointer: 450 millisecs

http://coliru.stacked-crooked.com/a/ab2f7e381a204f1f


> but changed it to use chrono and print as microseconds (since that is IMO more useful to visualize)

Using a wall-clock to measure processor time is a bad idea.
Reading this thread from here
http://www.cplusplus.com/forum/beginner/146620/#msg771119
onwards would elaborate on why this is so.

EDIT: Forgot to thank Smac89. Thank you, Smac89!
Last edited on
Topic archived. No new replies allowed.