Hiding Implementations: A look at performance (PIMPL vs Abstract Interface)

Abstract:
This is an 'article' proposal that I wish to discuss first (if anyone is interested). I investigated the the performance difference between a Pimple and an abstract interface. I found that when n is small, the difference is negligible (less than 2%). I also wanted to consider the "normal" class interface so I implemented a normal class as a "control". Overall, there isn't that much of a difference. The considerations for use when hiding implementation for an exported class is discussed.


Problem:
My current project got me wanting to find out about this particular issue. I tried finding an answer on the internet and no one provided one. I found a lot of conjecture, some saying the PIMPL implementation was more efficient, more suggesting the Interface approach was probably faster; but no facts. Most didn't suggest they knew. Some people also felt that this was a stupid question because they didn't feel it was relevant since the purposes for each were completely different and that if someone really wanted to know the answer they could write code to find out.

Well first of all.. this is not a stupid question. There are circumstances such as mine where this is a valid question. In my situation I really don't care about the reasons one will choose a Pimpl approach over an Interface. I only have one purpose in using it: all that matters to me is performance since all I want to do is hide the implementation from the exported class (in an external, dynamically linked library) so that I don't have to export all the symbols involved in the implementation. The speed is important because the whole point of using it is to provide a GUI Wrapper for .NET that can access multiple tens of thousands of data points while updating a view.

So I wrote a test program to find the answer. If anyone would like to expand on this (including myself) feel free to do so. I have included the code below for anyone's comments, curiosities, questions, and convenience.


Experiment:
Here are the classes I used. I used Qt as a GUI and I will not include that since it is up to the tester what GUI you want to use.

I initially wanted the calls to take a substantial amount of time so I wrote a recursive function to take up some time. I originally set it up so that there were 10 times as many non-test calls (to methods not involved in the test) as there are test calls. Afterwards I changed my mind so that they were equal. Each test call calls the "interface" function (whether pimpl or a true interface) which calls the implementation function which calls the recursive function only once. I ran a 1000 tests for each time and averaged the results.

Here is the "recursive" method called:
1
2
3
4
5
6
7
8
9
#include <QMainWindow> ; //arbitrary imported class so that we "do something"

__int64 TestCall(QMainWindow* window, int n)
{
	if (n == 0)
		return window->depth();
	else
		return n * TestCall(window, n - 1);
}


Here are all class definitions:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
class RegularClass
{
public:
	RegularClass();
	~RegularClass();
	void testRegular(QMainWindow* mainWindow);
};

class Pimpl
{
private:
	PimplPrivate* d;

public:
	Pimpl();
	~Pimpl();
	void testPimpl(QMainWindow* mainWindow);
};

class IAbstractClass
{
protected:
	IAbstractClass();
	virtual ~IAbstractClass();

public:
	virtual void testInterface(QMainWindow* mainWindow) = 0;
};

class VirtualClass : protected IAbstractClass
{
public:
	VirtualClass();
	~VirtualClass();

	void testInterface(QMainWindow* mainWindow) override;
};


Here is the regular (control) class implementation:
1
2
3
RegularClass::RegularClass() { }
RegularClass::~RegularClass() { }
void RegularClass::testRegular(QMainWindow* mainWindow) { TestCall(mainWindow, 1); }


Here is the Pimpl implementation:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
class PimplPrivate
{
public:
	PimplPrivate() { }
	~PimplPrivate() { }

	void testPimpl(QMainWindow* window) { TestCall(window, 1); }
};

Pimpl::Pimpl() :
	d(new PimplPrivate())
{
}

Pimpl::~Pimpl()
{
	delete d;
}

void Pimpl::testPimpl(QMainWindow* mainWindow) { d->testPimpl(mainWindow); }


Here is the abstract/virtual implementation:
1
2
3
4
5
IAbstractClass::IAbstractClass() { }
IAbstractClass::~IAbstractClass()  { }
VirtualClass::VirtualClass() : IAbstractClass() { }
VirtualClass::~VirtualClass() { }
void VirtualClass::testInterface(QMainWindow* mainWindow) { TestCall(mainWindow, 1); }


Here is the code that ran the test and reported the result
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
PimplVirtTest::PimplVirtTest(QWidget *parent)
	: QMainWindow(parent)
{
	ui.setupUi(this);
	float avgtime ;
	clock_t t = clock();
	clock_t sum = 0;

	// cache init/buffer
	for (int i = 0; i < 50; ++i)
	{
		t = clock();
		RegularClass* c = new RegularClass();
		for (__int64 i = 0; i < 10000; ++i)
		{
			c->testRegular(this);
		}
		delete c;
		sum += clock() - t;
	}
	sum = 0;
	
	for (int i = 0; i < 1000; ++i)
	{
		t = clock();
		RegularClass* c = new RegularClass();
		for (__int64 i = 0; i < 1000000; ++i)
		{
			c->testRegular(this);
		}
		delete c;
		sum += clock() - t;
	}

	avgtime = (float)sum / (float)CLOCKS_PER_SEC / 1000.0f;
	ui.regularLabel->setText(QString::number(avgtime));
	sum = 0;

	for (int i = 0; i < 1000; ++i)
	{
		t = clock();
		Pimpl* p = new Pimpl();
		for (__int64 i = 0; i < 1000000; ++i)
		{
			p->testPimpl(this);
		}
		delete p;
		sum += clock() - t;
	}

	avgtime = (float)sum / (float)CLOCKS_PER_SEC / 1000.0f;
	ui.pimplLabel->setText(QString::number(avgtime));
	sum = 0;

	for (int i = 0; i < 1000; ++i)
	{
		t = clock();
		VirtualClass* v = new VirtualClass();
		for (__int64 i = 0; i < 1000000; ++i)
		{
			v->testInterface(this);
		}
		delete v;
		sum += clock() - t;
	}

	avgtime = (float)sum / (float)CLOCKS_PER_SEC / 1000.0f;
	ui.virtualLabel->setText(QString::number(avgtime));
}



Results (in seconds):
1
2
3
4
n		No Interface	PIMPL		Abstract
   10,000	0.001340	0.001355	0.001342
  100,000	0.013313	0.013493	0.013349
1,000,000	0.132686	0.134420	0.132807


Discussion:
As one can see, the increase in time between a regular class and a PIMPL class is about 1.3% where with the abstract/virtual class it is about 0.17%. So the Abstract implementation is not quite 10 times as fast (around 7.5 times faster). However, when compared to a normal class, the difference for both is negligible (less than a 2% difference).


Conclusion:
The implications of this finding for people who want to simply hide their implementation from exportation are straightforward. Either one is an acceptable solution from a performance perspective. The increase in time from hiding behind a vtable or a pointer is at at most 1.3%. Therefore which implementation you use should be based on other considerations.

For simple exportation, I would prefer to use the abstract implementation because it is simpler from a coding AND conceptual standpoint.
Last edited on
It depends on purpose. You're not necessarily talking about abstraction here either. For instance, you can have a non-abstract virtual or pimpl interface. They're generally just used in conjunction with abstraction because it's good practice to keep things abstract.

pimpl and virtual interfaces are useful only at runtime where you can change the implementation out on demand. If you don't need code that is interchangeable at runtime, you can avoid the pointer completely and use some basic macro checks to include a header based on what you're building for.
Your timing of VirtualClass::testInterface() is incorrect. Line 61 generates a static call, not a virtual call, because v is a VirtualClass * and VirtualClass::testInterface() is non-virtual. The compiler knows that the only function that call could possibly be calling is VirtualClass::testInterface().

Repeat the measurement with this code:
1
2
3
4
5
IAbstractClass* v = new VirtualClass();
for (__int64 i = 0; i < 1000000; ++i)
{
	v->testInterface(this);
}
Hopefully the compiler will not be too smart and that will generate a virtual call.
There would be a significant difference in performance (only) if the function is inlined when the binding is at compile time.

a.cpp
1
2
3
4
5
6
7
8
9
10
struct A
{
    virtual ~A() = default ;

    virtual void foo() = 0 ;
    virtual void bar() = 0 ;
    virtual int baz() const = 0 ;
};

int test_a( A& a ) { for( int i = 0 ; i < 1000 ; ++i ) { a.foo() ; a.bar() ; } return a.baz() ; 

http://coliru.stacked-crooked.com/a/e12a162ac8c6488d

b.cpp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
struct A
{
    virtual ~A() = default ;

    virtual void foo() = 0 ;
    virtual void bar() = 0 ;
    virtual int baz() const = 0 ;
};

struct B final : A
{
    virtual void foo() override final { ++i ; --j ; }
    virtual void bar() override final { --i ; ++j ; }
    virtual int baz() const override final { return i+j ; }

    int i = 0 ;
    int j = 0 ;
};

int test_b( B& b ) { for( int i = 0 ; i < 1000 ; ++i ) { b.foo() ; b.bar() ; } return b.baz() ; 

http://coliru.stacked-crooked.com/a/ab1b79281f75bf3d

main.cpp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
#include <iostream>
#include <ctime>

struct A
{
    virtual ~A() = default ;

    virtual void foo() = 0 ;
    virtual void bar() = 0 ;
    virtual int baz() const = 0 ;
};

struct B final : A
{
    virtual void foo() override final { ++i ; --j ; }
    virtual void bar() override final { --i ; ++j ; }
    virtual int baz() const override final { return i+j ; }

    int i = 0 ;
    int j = 0 ;
};


struct timer
{
    const std::clock_t start = std::clock() ;
    ~timer()
    {
        const auto end = std::clock() ;
        const auto ms = (end-start) * 1000.0 / CLOCKS_PER_SEC ;
        std::cout << ms << " msecs.\n" << std::flush ;
    }
};

int main()
{
    int test_a( A& a ) ;
    int test_b( B& b ) ;

    B b ;
    A& a = b ;

    int x = 0 ;
    int y = 0 ;

    const int N = 200'000 ;

    {
        std::cout << "    virtual: " ;
        timer t ;
        for( int i = 0 ; i < N ; ++i ) { x += test_a(a) ;}
    }

    {
        std::cout << "not virtual: " ;
        timer t ;
        for( int i = 0 ; i < N ; ++i ) { x += test_b(b) ;}
    }

    return x - y ;
} 

ln -s /Archive2/e1/2a162ac8c6488d/main.cpp a.cpp
ln -s /Archive2/ab/1b79281f75bf3d/main.cpp b.cpp

echo -e '\n------- clang++ -O3 -----------\n' && clang++ -std=c++14 -stdlib=libc++ -O3 -Wall -Wextra -pedantic-errors  main.cpp a.cpp b.cpp && ./a.out
echo -e '\n--------- g++ -O3 -------------\n' && g++ -std=c++14 -O3 -Wall -Wextra -pedantic-errors  main.cpp a.cpp b.cpp && ./a.out

------- clang++ -O3 -----------

    virtual: 3910 msecs.
not virtual: 2950 msecs.

--------- g++ -O3 -------------

    virtual: 1220 msecs.
not virtual: 0 msecs.

http://coliru.stacked-crooked.com/a/2ce7a1f902db3fc4
Topic archived. No new replies allowed.