Passing method pointer or function pointer as a template argument.

Hello,

I have a class template:

1
2
3
4
5
6
7
8
template<typename Callable>
class MyClass
{
	void CallFunction()
	{
		Callable();
	}
};


This works fine when passing a function pointer or lambda. The thing is, that sometimes I would like to pass a method pointer. Is it possible to change this class template in a way that MyClass would accept different types. Is it possible to resolve this in compile time?

I would like to avoid of using std::function, because Callable() will be called very very often, it must be as fast as possible.

Assuming this is possible to resolve via templates, how do I declare instance of MyClass later? I mean in such case:

1
2
3
4
5
6
class DifferentClass
{
private:
	void FunctionToCall(){}
	MyClass<DifferentClass::FunctionToCall()> my_class; //How to declare my_class properly?
};


Thank you for your help.
it must be as fast as possible.


That seems unlikely. Very unlikely. It is very rare that software must run as fast as possible; it is very common that people spend a lot of time writing complicated code because they think they need to shave off a millionth of the run-time.

How do you know it needs to run as fast as possible? How do you know that the easiest thing you can do to make it faster is this kind of low-level function pointer fiddling?

How do you know it needs to run as fast as possible?


Because this function will run 24/7 in a tight loop and it will be handling frames from camera feed. I have only 66 milliseconds to process each frame. I am doing image recognition on each frame and it is very time consuming. I have limited resources on my embedded system.

How do you know that the easiest thing you can do to make it faster is this kind of low-level function pointer fiddling?


Because every other thing had been already optimized in my image processing function.

Thanks for your input.

> I would like to avoid of using std::function, because Callable() will be called very very often,
> it must be as fast as possible.

Measure it. It may not be as slow as you expect it to be.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
#include <iostream>
#include <functional>
#include <ctime>
#include <cassert>

template < typename CALLABLE > struct A
{
    explicit A( CALLABLE callable ) : fn(callable) {}

    void call_it() const { fn() ; }

    CALLABLE fn ;
};

struct B
{
    long long foo( int a ) { return x += a ; }
    volatile long long x = 0 ;
};

template < typename T > struct C
{
    explicit C( T& obj, long long (T::*f)(int), int a )
        : object(obj), fn(f), arg(a) {}

    void call_it() const { (object.*fn)(arg) ; }

    T& object ;
    long long (T::*fn)(int) ;
    int arg ;
};


int main()
{
    B b ;
    const int INCR = 3 ;
    const std::function< void() > fn = std::bind( &B::foo, std::addressof(b), INCR ) ;

    const int N = 100'000'000 ;

    {
        A a(fn) ;
        const auto start = std::clock() ;
        for( int i = 0 ; i < N ; ++i ) a.call_it() ;
        const auto finish = std::clock() ;

        std::cout << "call through call wrapper: "
                  << N << " calls in " << (finish-start) * 1000.0 / CLOCKS_PER_SEC
                  << " millisecs.\n" ;

        assert( b.x == (long long)N*INCR ) ;
    }

    b.x = 0 ;

    {
        C c( b, &B::foo, INCR ) ;
        const auto start = std::clock() ;
        for( int i = 0 ; i < N ; ++i ) c.call_it() ;
        const auto finish = std::clock() ;

        std::cout << "              direct call: "
                  << N << " calls in " << (finish-start) * 1000.0 / CLOCKS_PER_SEC
                  << " millisecs.\n" ;

        assert( b.x == (long long)N*INCR ) ;
    }
}

http://coliru.stacked-crooked.com/a/ad5de77e878c578a
https://rextester.com/ANX61384
Thank you for your example.
@unspoken,

@JLBorges knows the stuff here. Engineering is all about setting up experiments and collecting data to prove what's best.

Consider what your code proposes:

1
2
3
4
5
6
class DifferentClass
{
private:
	void FunctionToCall(){}
	MyClass<DifferentClass::FunctionToCall()> my_class; //How to declare my_class properly?
};


It's not just about declaring a function, because to call a member function (what you're calling a method) requires an instance of DifferentClass.

By the time you solve that problem you end up with a start on what std::bind is.

In order to be fully generic you have to deal with what bind does for parameters, for the instance, for the pointer to the member function....aaaaannnnndd you're almost done rebuilding bind.

If performance is an absolute priority, the best you're going to manage is to figure out how to remove the obstacles, like the instance (not likely), or the pointer to member function (which can invoke a double de-reference kind of behavior, something akin to a virtual function call).

Consider how std::sort does this. It takes a function object. That can expand into inline code. If performance is absolute, that may be the kind of direction you'd have to take to really have an impact, and it might be considerable depending on how much nested calls might be removed.

This means moving away from that generic notion of a container which holds a collection of callable objects (all of which involve something of a pointer to a function paradigm, or a call to a member function through an instance) into a composition of code through the same idea as std::sort's comparison function object.

Eigen (the linear algebra library) does something like this from a unique perspective. For Eigen, a statement like:

a = b * c;

Where c is a matrix, b is a vector and a is a vector result...this doesn't actually perform the multiplication of a vector by a matrix at "*", as we'd expect from a common C++ pattern. Instead, the "*" operator receives an instruction code, not a matrix, which feeds to "=" an instruction code resulting in a run time optimized output. The actual execution of the multiplication isn't started until execution enters the "=" operator function.

Depending on how your application code calls these callable objects, you may need to shift perspective in a similar (very loosely related) way, such that you can arrange to couple function objects instead of callable objects, so the code can be constructed inline.

Perhaps policy based template design...I'm guessing here, because I only see the callable object problem, not the overall application level code you're working on, and therefore have no idea why a callable object collection would be the design choice.

But back to @JLBorges point, and what I'm echoing, is that quite often such "early optimization" thoughts are more off course than really matters.

Taking the point from another perspective (on the subject of performance), I had a long running exchange (almost an argument) about someone of the belief that assembler had to be faster than C++ because it was lower level. Yet, the level isn't what makes a difference, at least not in the 21st century with the current (and likely up and coming) optimizing compilers. Sometimes a higher level expression can lead to even better performance, which I admit sounds counter intuitive at first. std::sort is Stroustrup's favorite (or once favorite) demonstration of how the opposite can be true, and for a fundamental reason (it's inline opportunity from the function object). This happens because the design gives the optimizer more information that does C's qsort function, which requires a function pointer to a comparison function.

The same basic principle is under what I'm trying to convey relative to your inquiry. We all know that early optimization is generally recognized as an evil, derailing productivity. However, as Stroustrup points out in one of his older works, sometimes it isn't an evil, because sometimes it's about design.

We also know that minor optimization adjustments produce minor results (with occasional bumps that thrill), whereas algorithmic changes usually produce much more dramatic results.

This becomes my point here. If you can refashion the way calling code uses these callable objects, maybe you can fashion a different design which can more frequently take advantage of function objects instead of callable objects, thereby giving the optimizer more information from an expression resulting in better optimized performance.

Sometimes such ideas require major refactoring, dropping previously held conventions and ultimately leading to a dead end, where we accept that something like std::bind and std::function have been engineered to within a few % of an "in house" solution, with less work on our part.








Because this function will run 24/7 in a tight loop and it will be handling frames from camera feed. I have only 66 milliseconds to process each frame.


So NOT as fast as possible. It only needs to be fast enough to complete each frame in 66 milliseconds. That is a lot, lot slower than as fast as possible.

Because every other thing had been already optimized in my image processing function.

If you are literally at the point where the tiny, tiny indirection in this function call is the only thing remaining, and you had also already measured it accurately enough so that you know that's enough to make the 66 millisecond limit... well, I just plain don't believe you. If you were that good, you wouldn't be here asking this question.

Here is someone's notes from a comparison; https://stackoverflow.com/questions/14306497/performance-of-stdfunction-compared-to-raw-function-pointer-and-void-this

They found the difference to be 2 instructions per call. That's very, very, very few instructions. I would expect those two instruction to be churned through in nanoseconds.

Someone else on that page found a lambda to be faster than a function pointer, so maybe you're now actually making your code slower.

Here's Boost talking about the efficiency of their implementation of something similar to std::function; "With a properly inlining compiler, an invocation of a function object requires one call through a function pointer."

So just how many times per frame are you calling this function? If it's only 2 extra instructions, then you must be calling it billions of times per frame for it to make a difference.
Last edited on
@Niccolo thank you for your valuable input and your time. I might refactor my code to use functor or use std::function after all.

@Repeater After reading Internet I saw that general consensus is that std::function is slower and I thought that I can avoid using it without compromising my design. I've asked my question more for an exercise rather than out of necessity as I do not have performance problems right now.

So just how many times per frame are you calling this function?
Once :)
As far as performance for your requirements go, do not rely on opinions expressed by other people.
Measure it for your specific use case on your specific platform and tool chain.
Topic archived. No new replies allowed.