Contiguous Polymorphic Memory Storage

Pages: 12
I am creating a program that needs extremely high performance as I need it to update a huge amount if data in an extremely short period of time.
I have a base virtual class [code]Game Object[\code] that has an extreme number of children with carrying sizes.

Currently they are stored in a 1 dimensional vector with a base pointer, and are allocated in order, however the addresses obviously aren't contiguous and thus it slows down due to cache misses.

Does anyone have a basic concept that I can use to contiguously allocate polymorphic classes?
extreme number of children

Can you give an estimate in quantity and size within an order of magnitude? What architecture is your program running on?
Last edited on
You may oveload the new operator to ensure continuous memory:

http://www.cplusplus.com/reference/new/operator%20new/
lordseanington wrote:
needs extremely high performance
lordseanington wrote:
I have a base virtual class

those two may be at odds with each other, and deriving from some kind of "Object" sounds like a major anti-pattern in any case. Of course, as with any potential performance issue, you have to measure first, but even from general guidelines point of view, a virtual function in C++ is something that requires serious justification. I rarely find one.
(reference: https://github.com/isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md#Rh-virtual )

Does anyone have a basic concept that I can use to contiguously allocate polymorphic classes?

Not "polymorphic" specifically, but it sounds like you may wish to explore memory pools, IF system allocator turns out to be a bottleneck. A couple strategies that could be a fit:
1. monotonic pool, every allocation is just a pointer bump, deletion not supported. To clean up, you drop the entire pool (e.g. when loading a new game area/level)
2. multipool allocator, with a separate pool for objects of each specific size. (I think even the old "Modern C++ Design" mentioned that). Here you can easily support deletion: the pool marks the slot empty, and reuses it later.

if the program uses a lot of parallelism, you may have to consider some other approaches . after measuring.
Last edited on
I am creating a program that needs extremely high performance as I need it to update a huge amount if data in an extremely short period of time.

Approximately how much data? Approximately how much time? And can you describe the problem a little more? It's possible that you don't really need to update all that data all the time.
not sure I totally understood what you wanted, but you can make a vector, array, or just a byte-blob that has the data you want to iterate over inside, and take a pointer to that piece of memory in the class, via a microscopic hand-rolled memory manager type setup.

the crudest example would be, if you wanted to iterate over doubles,

double * dp = new double[hugevalue]; //yea, its global. You can wrap it up, this is an example.
static int dx = 0;

class()
{
..
ctor
{
member = &(dp[dx++]);
}
}

you can happily iterate over dp in your tight inner loop, and still get to the value associated with object instances via the pointer.

Wrapping all that up in OOP is a little work but not difficult, just a bit of typing. You might have to make a "deleted" element if your objects go out of scope, so you can recover / re-use the released pieces of dp. It just depends on what you need to do.

Ideallly you can allocate all the memory your program will need up front and never fool with memory again, if you really want to do that.



Last edited on
In modern C++ standards std::vector is guaranteed to store data contiguously (starting from C++-0x I think). To reduce reallocation there are options to allocate needed amount of data at creation.
Last edited on
raschupkin wrote:
In modern C++ standards std::vector is guaranteed to store data contiguously

Yes, but OP has std::vector<Foo*>, a vector of pointers. Those pointers are indeed stored contiguously. However, the objects that they point to are elsewhere and not a concern of the vector.
Last edited on
you can do it with vectors, its the same conceptually.

what needs to happen is

vector f<foo> //somewhere

and vector fp<foo*> //in the class?

where
fp[0] = & f[0];

so that f can access the items efficiently for iteration and fp can keep a handle to the individuals. You still want to bundle it up cleanly so you don't slowly eat the system out of memory over time by growing the vectors forever. You have to recycle locations that are no longer in use when you go to create a new one.




Last edited on
you can do it with vectors, its the same conceptually.

what needs to happen is

vector f<foo> //somewhere

and vector fp<foo*> //in the class?

Just fyi, you're failing to address the OP's need for "foo" to vary in type, which is the reason pointers were used in the first place.
Hmm. Not sure how to do that one, apart from a rather crude union approach to get them all the same size and "shape" again. I know you can do it with polymorphism if set up that way.



seems like abstracting the common stuff down to a base class that you can allocate this way would allow you to poly while managing the memory. But I will defer to the experts is there is a better way to do this...

Last edited on
The "easiest" way would be to have a custom allocator that allocates by incrementing a pointer, like Cubbi suggested. This would only work well if the set of live objects is more or less constant. If objects are constantly being created and destroyed the memory will become fragmented.
Whether keeping all the objects contiguously would improve cache efficiency remains to be seen. My guess would be not, or not much. Cache behavior is optimal when the data is contiguous and homogeneous.
I am glad to get such a turnout on responses. I am not sure if I should overload the standard new operator as I am already getting pretty good speed. I may want to make a test branch.

With my architecture, I have a system similar to this ...
Game Object
Children: Tile and DynamicE
Tile Children: WoodenDoor, Grass, ect...
DynamicE Children: Zombie, Monster, ect...

This is just an example layout but gives you the idea.
Game object has the following virtual functions
1
2
virtual void Update(unsigned int T)=0;
virtual bool RunEvent(Event& E)=0;


All of type Tile are stored into a Map object containing Chunk objects. DynamicE is stored in a separate unordered map of Id's.

Tile only adds a
1
2
3
4
5
6
virtual bool SaveToFile(std::ofstream*);
virtual bool ReadFromFile(std::ifstream*);

private:
unsigned int Id;
unsigned int AssignedImage;


All of my current children of Tile are ROUGHLY the same size, with a few containing a Boolean value to define toggle states or similar.

If they are all roughly the same size I can create a maximum size requirement and have each one use prealocated memory allocated in a block equal to the bytes taken up by (MaximumSize * ChunkSizeX * ChunkSizeY * ChunkSizeZ) and clean up the memory afterwords, however I am not necessarily sure how I would implement this.
Last edited on
I think some of my ideas still work...

"All of my current children of Tile are ROUGHLY the same size, with a few containing a Boolean value to define toggle states or similar."

^^^ Can you just add the Booleans to all the classes, and make 1 class that has a couple of extra members that may not always be used? Might simplify a lot of your code to "factor this out".

I think the union idea works too, but I am not sure if there is a memory gotcha in that when used with certain types of classes. You could try it.
Just how many tiles do you need to allocate that you need to optimize cache behavior?
I do not want to bring the bool up in the hierarchy, as I was namely using that as an example. Some tiles in the future may use other types that aren't provided, so I want to give some flexebility. Also with the tiles question, I am currently updating roughly 80,000 tiles in roughly 8-9ms according to my tests (most tile's update functions either don't do anything or are miniscule, as the event system handles a lot). The game I am creating is planned to be multiplayer and is currently split into 50x50x2 chunks (its 2d but still has layers). I can change chunk size with relative ease, but 80,000 is 16 chunks, and this would limit me to 4 players (imagine 4 players standing on the corners between chunks). I am aiming to have a 6 player maximum and my CPU budget needs all optimization it can get to fulfill this.

If I do get this system working I plan to make tiles much larger rather many small chunks. This will be hard on memory but I have plenty of memory to spare.

Edit: Note that the time I gave is allocating each block individually in the order I update them.
Last edited on
Something sounds wrong here.

At the size you're describing, your tiles should look something like
1
2
3
4
5
6
struct Tile{
    TileType type;
    std::uint32_t boolean_properties;
    int property1;
    //etc. All data. No functions.
};
and your update function should look something like
1
2
3
4
5
void update_tiles(){
    for (auto &tile : tiles){
        update_functions_array[(int)tile.type](tile /*other parameters*/);
    }
}
Is there anything that precludes this design?
I am currently loading the tiles dynamically from a DLL, it loads the given block into a global blocks list. I could probably implement a system where the DLL returns a list of pointers to functions and remove them from the list when the DLL is unloaded.

(I am using a system where a DLL is streamed into a buffer and then loaded using a special set of functions to load a windows DLL from memory).
I hadn't thought of this initially, I'll try implenting this and see if it conflicts with any of the systems I have (which might take a little while). I could also do this with the RunEvent function,

I have a person who is relatively new to C++ working on the project aswell, and will be working as a plugin programmer for the engine. If this system works well I can just have a simple list of tile objects.

I could probably add variables to specific update functions rather than by block type, and I'll need to make some modification to the gameobject class to support this.
Before I started I was thinking through it and found one issue. This is not quitrle flexible enough on variables. Imagine you have a tile that has a member that has a position value, while another may have an animation set. Another has an integer to define ....
And then another has all of them.

The best I could figure is to have everything you could ever possibly use (taking up a huge amount of space). I could also have normal inheritance where the tile contains a pointer to the update/event functions and have the constructor in the inherited class set it up. Sort of a combination of the two (which I believe would work better), but this still does not solve the issue of keeping it contiguous, does it? Due to each block having its own members.

Maybe create a pool of data and have it step at incriments of the maximum size of each tile, and just use simple pointer arithmetic to get a given tile. And when I want to replace it, I could just destroy the block at that position and put a different one at that position. Would that be possible?
Pages: 12