Allocating memory for a large number of objects at the same time where each object hold maps and vectors

Hello,

I am writing some code which is time critical and I've noticed that memory allocation takes up a lot of time because I have to create an object up to 1,000,000 times. Each object is created individually throughout the course of the program. The objects don't need to be deleted before the end of the program. The objects contain several maps and vectors that hold pointers to other objects. I know rough upper limits on the number of elements in the maps and vectors.

In C one would use arrays instead of maps and vectors and the objects would be

1
2
3
struct A{
B* arr[50];
};


Then I would allocate memory for all objects in the beginning by calling
A* memA = (A*) malloc( 10000000 * sizeof(A));

and then increment the memA pointer every time I use a chunk of that memory.

How would I do something like that in C++?
For now lets ignore that we have maps and vectors in the object.
calling
A* memA = new A() [10000000];
will allocate memory for 1 million objects first (ok sort of what I want) and then call the constructor for each of them (don't want this as I want to call them throughout the algorithm - I don't actually know how many of those objects I really need)

Lets introduce the maps and vectors again. This is actually what hurts the most: allocating memory for each individual map-element or vector for each object.
What happens right now is the objects constructor gets called and I reserve memory for the vectors but can't do so for the maps as that functionality is not provided. But even reserving memory for the vectors is not good enough as this is done when the object is created (throughout the algorithm).

Ultimately what I need is allocating memory for 1 million objects at once where each object at that point gets allocated enough memory for each of it's maps and vectors without calling the object's constructor.

Note: memory is of course limited but not as critical as time.

Thanks for your help!
Last edited on
If you need something to be fast, why are you using std::maps? For the constructors, have the default constructor be empty and then have a constructor with arguments, that would be explicitly called later, that handles all the initialization stuff.
This wont be my last comment on the subject but I figured I'd give you something to go with while I'm looking into some other things. First, you may be able to create a custom allocator for the std::map, however, I don't know that that is actually a bottleneck. You could use the maps allocator like you would reserve, however, this would be implemented by you.

There is also boost.pool, which I would suggest using more than my above advice...
Here is the read.
http://www.boost.org/doc/libs/1_49_0/libs/pool/doc/html/index.html

Also, you can still use malloc in c++, if you need to (this way you wont fire off those constructors).

This is actually what hurts the most: allocating memory for each individual map-element or vector for each object.
The actual pushing/inserting of the elements (and their corresponding allocations?) or are you talking about just the construction of the empty containers?

Also, how are these containers being filled, element by element? Is it possible that you can make insertion use range operations, this would be significantly faster than element by element.

Also, is it possible if you have an idea of how many objects you will have, that you create x vectors who's capacities have been set, change your objects to store pointers to vectors, so that instead of reserving inside the algorithm, you are only assigning pointers to vectors that have already been reserved (this may be catastrophic to change, but just giving some other considerations, the pointer assignments are atomic.)
Last edited on
Also, you can still use malloc in c++, if you need to (this way you wont fire off those constructors).

So would I use and increment the memA pointer as in the above example and then call just the constructor without new? As in
memA = A();

Yes, I had read about the boost::pool but it seems like there is no way of telling how large the pool actually is. Please correct me if I'm wrong.


The actual pushing/inserting of the elements (and their corresponding allocations?) or are you talking about just the construction of the empty containers?

The insertion of the elements to the map because it has to allocate the memory for each element/key every time.
And yes they are filled element by element. Unfortunately range operation is not possible.
> There is also boost.pool, which I would suggest using more than my above advice...

+1

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
#include <vector>
#include <map>
#include <boost/pool/pool_alloc.hpp>
#include <ctime>
#include <iostream>

struct A
{
    static inline void* operator new( std::size_t sz )
    { return pool.allocate() ; }

    static inline void operator delete( void* p )
    { pool.deallocate( static_cast<A*>(p) ) ; }

    A()
    {
        vector.reserve(1000) ;
        for( int i=0 ; i<100 ; ++i ) map.insert( std::make_pair( i, this ) ) ;
    }

    std::vector< A*, boost::pool_allocator<A*> > vector ;
    std::map< int, A*, std::less<int>,
              boost::fast_pool_allocator< std::pair<const int, A* > > > map ;
    char other_data[50] ;
    static boost::fast_pool_allocator<A> pool ;
};

boost::fast_pool_allocator<A> A::pool ;

int main()
{
    std::clock_t start = std::clock() ;
    for( int i = 0 ; i<10000 ; ++i ) new A ;
    std::cout << ( std::clock() - start ) / double(CLOCKS_PER_SEC) << " secs.\n" ;
    // on my machine (laptop), took 0.253 secs
    // for 10K object allocs + 10K vector resizes + 1M map inserts
}


Apologies for not getting back to this earlier but I got side tracked by other parts of the code.

This looks good.

But I'm afraid I haven't fully understood the pool objects yet. Here is how I think it works. Please correct me if I'm wrong:

A first call to new A will allocate a pool object dedicated to struct A. This is a continuous chunk of memory with size X.

Then, for whatever you do to an object of struct A (add to vector (maybe even exceed your reserved vector size) or add to the map) it uses memory of that pool. I assume it does this by moving a pointer along in the array of memory?
But what happens when it hits the limit of the pool? I assume a new one is allocated?

Also, how is the size (X) of the pool determined?


But what happens when it hits the limit of the pool? I assume a new one is allocated?
I'm not sure about this one I'd have to do some research.

Also, how is the size (X) of the pool determined?
If you are referring to the size type of the chunks the pool allocates, this is determined by the type you are using for the allocator.

If you used integers, it would allocate chunks the size of integers and only integers. If you use class A, it will allocate chunks of size class A... etc. See JLBorges great example above for the answer to this question.
Last edited on
allocate chunks of size class A..

Though not exactly what I meant with that question: How does it know the size of struct A? In a sense it's not clear how large struct A is. The size of the 'body' of A is clear but you don't know how many elements get added to the map. And that was the issue here.

Also, how is the size (X) of the pool determined?

I mean how many elements of struct A (neglecting the issue about the elements in the map for now) fit into the pool? 10? 1000000? Big difference as it will have to allocate a block of memory for the pools much more often.
> How does it know the size of struct A?

The template is instantiated with A as the template parameter.
1
2
3
4
5
template< typename T > struct some_class
{
    enum { SIZE_OF_EACH_OBJECT = sizeof(T) } ;
    // ...
};



> I mean how many elements of struct A (neglecting the issue about the elements in the map for now) fit into the pool?

A pool contains a linked list of 'blocks'. Each 'block' contains many fixed size 'chunks' (of size sizeof(A) in this case). The pool allocator allocates a new 'block' as and when required.
See 'Simple Segregated Storage' in
http://www.boost.org/doc/libs/1_46_1/libs/pool/doc/concepts.html

Yes, I understand all that but the question is what happens if I call the constructor of A but don't add any elements to the map yet. Later in the program I then add 10000 elements to the map of each object A. Where are these elements allocated? In the memory held by the pool? Or are those elements still allocated one by one?

At the time when you call A() the number of elements in the map is not known. So a call to sizeof(A) can't return all the memory necessary for A because only the body of A (which only includes the overhead of the map and the vector of size 1000) is known.
> Yes, I understand all that

No, you haven't.


> what happens if I call the constructor of A but don't add any elements to the map yet.
> Later in the program I then add 10000 elements to the map of each object A.
> Where are these elements allocated?

In that small program, there are three different (standard library compatible) pool objects:

1. The pool from which objects of type A are allocated - boost::fast_pool_allocator<A> A::pool. This is a fast_pool_allocator<>, which is optimized for allocation and deallocation of one object at a time.

2. The pool from which memory for the vector is allocated - boost::pool_allocator<A*>. This is a pool_allocator<>, which is optimized for allocation and deallocation of arrays of objects.

3. The pool from which memory for the nodes containing the key-data pairs of the map is allocated - boost::fast_pool_allocator< std::pair<const int, A* > >. This is a fast_pool_allocator<>, which is optimized for allocation and deallocation of one object at a time.

Whenever an object of type A is created with a dynamic storage duration, the memory comes from pool 1.

Whenever a vector needs more memory, the memory comes from pool 2.

Whenever a map needs some memory, the memory comes from pool 3.
Ah! ok that makes a lot more sense now.

Since the insertion to the map is the most crucial part I separated this from struct A and checked its performance.


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
struct B{};

foo() {
   map<int, B*, less<int>, boost::fast_pool_allocator< std::pair< const int, B* > > > m1;
   map<int, B*> m2;
   int Large = 1000000;
   B* pB = new B();


   std::clock_t start = std::clock() ;

   for(int i = 0; i < Large; ++i)
	m1.insert(make_pair(i , pB));
			
   std::cout << "poolmap " << ( std::clock() - start ) / double(CLOCKS_PER_SEC) << " secs.\n" ;



   start = std::clock() ;

   for(int i = 0; i < Large; ++i)
	m2.insert(make_pair(i , pB));
	
   std::cout << "regular map " << ( std::clock() - start ) / double(CLOCKS_PER_SEC) << " secs.\n" ;
}


unfortunately performance wasn't all that much better for the map using a pool:
poolmap 3.438 secs.
regular map 4.546 secs.
Am I missing something?
> Am I missing something?

Enabling optimizations (in particular, inlining), perhaps?

For equivalent code, running on a slow first generation Atom N270 (1.6 GHz), I get (MinGW GCC 4.7 with -O3):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
#include <ctime>
#include <iostream>
#include <map>
#include <boost/pool/pool_alloc.hpp>

struct B ;

template < typename MAP > void time_it( const char* tag )
{
    enum { LARGE = 1000000 } ;
    B* pB = nullptr;
    MAP map ;

    std::clock_t start = std::clock() ;
    for( int i = 0 ; i < LARGE ; ++i ) map.insert( std::make_pair( i , pB ) ) ;
    std::cout << tag << ": " << ( std::clock() - start ) / double(CLOCKS_PER_SEC) << " secs.\n" ;
}

#define TIME_IT( a ) time_it<a>( #a ) ;

int main()
{
    typedef std::map< int, B* > std_map ;

    typedef boost::fast_pool_allocator< std::pair< const int, B* > > fast_allocator_t ;
    typedef std::map< int, B*, std::less<int>, fast_allocator_t > pool_map ;

    TIME_IT( std_map ) ;
    TIME_IT( pool_map ) ;
}

Output:
std_map: 2.766 secs.
pool_map: 1.234 secs.

Topic archived. No new replies allowed.