I've been writing a class that allocates large amounts of memory blocks to avoid calling 'new' and 'delete' (it's slow). I wrote this up in some spare time but I'm questioning it's efficiency, especially in the functions like 'Free', 'Allocate' and 'Combine'.
If there is a better way to do this please tell me, really need this code to be extremely fast.
I'm also looking for methods to avoid memory fragmentation, as that requires extra unneeded memory to be allocated.
class CF::Management::Allocator
{
public:
Allocator(CF::Uint64 size)
{
this->size = size;
};
void AllocateBlock()
{
CF::Uint8* block = nullptr;
MemAddr address;
try{block = new CF::Uint8[size];}
catch(std::bad_alloc &ba)
{
throw CF::Error("CF::Management::Allocator::AllocateBlock() - Failure allocating new block.");
}
// Clear the memory allocated
memset(block, 0, size);
memset(&address, 0, sizeof(address));
// Fill out the structure.
address.addr = (CF::UintPtr)static_cast<void*>(block);
address.size = size; // Set the address size to the whole block since it's empty.
address.block = block;
// Push back the data.
freeAddrs.push_back(address);
blocks.push_back(block);
}
template<class T>
void Allocate(T* ptr)
{
CF::Uint8* block = nullptr; // The useable block
size_t typeSize = sizeof(T); // The size of the block
bool space = false; // If there is space for the type or not
// Check if the pointer type is larger than the block itself. If it is, throw an error.
if(typeSize > size)
{
throw CF::Error("CF::Management::Allocator::Allocate() - the memory blocks aren't large enough.");
}
// Loop through the index of the free addresses
for(size_t i = 0; i < freeAddrs.size(); i++)
{
MemAddr &adr = freeAddrs[i]; // Reference to the current block
// If there's enough space to construct the object
if(adr.size >= typeSize)
{
space = true; // Then let us know that there is space
MemAddr tmp = adr;
tmp.size = typeSize;
usedAddrs.push_back(tmp);
// Give the pointer a bit of memory
ptr = adr.block[addr];
// Move the address over
adr.size -= typeSize;
// If there is no more memory left in the free space then delete
// it from the vector
if(adr.size == 0) {freeAddrs.erase(freeAddrs.begin() + i);}
break;
}
}
// If there isn't any space in the currently allocated blocks
if(!space)
{
// Allocate another block
AllocateBlock();
// Call this function again.
Allocate(ptr);
}
}
template<class T>
void Free(T* ptr)
{
Uint8* ptr_u8 = static_cast<Uint8*>(ptr);
// Loop through the memory blocks and check to see if this address is equal to the block
for(size_t i = 0; i < usedAddrs.size(); i++)
{
for(size_t b = 0; b < size; b++)
{
if(ptr_u8 == usedAddrs[i][addr])
{
size_t next = i + 1;
// Simply clear the memory
memset(ptr_u8, 0, usedAddrs[i].numElements);
if(next != freeAddrs.size())
{
// If the blocks don't have any interuptions between them
if((freeAddrs[i].block + (freeAddrs[i].size * freeAddrs[i].numElements)) == freeAddrs[next].block)
{
MemAddr address = freeAddrs[i];
MemAddr cAddress = freeAddrs[next];
cAddress.addr = address.addr;
cAddress.block = address.block;
cAddress.size += address.size;
freeAddrs.erase(freeAddrs.begin() + i);
freeAddrs[i] = cAddress;
}
}
}
}
}
}
void Combine()
{
for(size_t i = 0; i < freeAddrs.size(); i++)
{
size_t next = i + 1;
if(next != freeAddrs.size())
{
// If the blocks don't have any interuptions between them
if((freeAddrs[i].block + (freeAddrs[i].size * freeAddrs[i].numElements)) == freeAddrs[next].block)
{
MemAddr address = freeAddrs[i];
MemAddr cAddress = freeAddrs[next];
cAddress.addr = address.addr;
cAddress.block = address.block;
cAddress.size += address.size;
freeAddrs.erase(freeAddrs.begin() + i);
freeAddrs[i] = cAddress;
}
}
}
}
~Allocator()
{
for(size_t i = 0; i < blocks.size(); i++)
{
delete[] blocks[i];
}
}
private:
struct MemAddr
{
CF::Uint8* block;
CF::UintPtr addr;
CF::Uint64 size;
CF::Uint64 numElements;
};
CF::Uint64 size;
std::vector<MemAddr> freeAddrs;
std::vector<MemAddr> usedAddrs;
std::vector<CF::Uint8*> blocks;
};
What makes you believe that this allocator is more efficient (either in terms of space or in terms of time) than the default allocation functions? Have you measured anything?
> really need this code to be extremely fast.
Write a pool allocator, perhaps. Or better, use a debugged, tested, profiled, pool allocator.
Nope, but Framework told me for his job or whatever that he allocated large blocks of memory and constructed objects from them because 'new/malloc' and 'delete/free' were slow.
I'm trying to reduce the amount of dependencies that my engine uses as much as possible, and I heard that boost is pretty large, so I decided to just write my own.
The speed of memory allocation and/or the chance of fragmentation occurring depends on several factors, like how big of clumps you're allocating, the lifetime of the allocated memory, and how frequently you are allocating.
Memory pools are good for when you will be doing rapid and frequent allocation/destruction of a single type.
That said... they're not really necessary for a lot of situations. Just because some other programmer said "we needed to implement memory pools because dynamic allocation was too slow".... you should not misinterpret that to mean "dynamic allocation is slow and you shouldn't do it".
By doing this you might be trying to solve a problem that doesn't exist. Normal allocation might be just fine for you.
Of course, exploring this area is great for academic purposes... but if your goal is on the finished product, I would not do this until you know dynamic allocation is causing performance issues in your program. Don't try to fix what isn't broken.
> Memory pools are good for when you will be doing rapid and frequent
> allocation/destruction of a single type.
The most dramatic improvements in performance with memory pools are typically seen when:
a. Many different types are involved (with one pool catering to objects which could belong to different types, but all of them having the same size). Heap fragmentation becomes a significant issue when a large number of allocations/deallocations of varying sizes are intermingled.
b. The memory footprint of the objects are small (in comparison to the chunk size that the pool uses). The classic trade-off between space-efficiency and time-efficiency.
Note for Lumpkin: Before you actually start using Combine() (rght now you are not), you would need to make sure that the pointer returned by Allocate() meets the alignment requirements of the type.