Do I need to worry about allocating lots of tiny arrays?

I'm writing a program that needs to keep track of user defined attributes. I have a class that describes the details of the attribute, along with a pointer to some dynamically allocated memory to hold the data of the attribute. In practice, most of the attributes hold very short amounts of data, being 4 bytes or less - however, they can also potentially be quite long taking up several kilobytes.

The straight forward way to handle this is to just malloc() and free() these data arrays every time I create/delete an attribute; however, I've heard rumors about how memory allocation is slow and leads to memory fragmentation and other bad things. How worried do I need to be about this in C++? Can I just malloc() a lot of tiny arrays and let the memory manager handle it, or should I be trying to group these together into larger pages?


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
    enum AttribType
    {
        SHORT,
        INT,
        FLOAT
    }

    class Attribute {
        AttribType _type;
        int _length;
        void *_data;

    public:
        Attribute(AttribType type, int length = 1) :
            _type(type),
            _length(length)
        {
            _data = malloc(length * numBytes(type));
        }
        
        ~Attribute()
        {
            free(_data);
        }
        
        int numBytes(AttribType type)
        {
            switch(type)
            {
            case SHORT:
                return sizeof(short);
            case INT:
                return sizeof(int);
            case FLOAT:
                return sizeof(float);
            }
        }
    }

Last edited on
Well, it depends.
How many instances of that class do you expect you'll need? Thousands? Millions? Billions?
How long do you expect the the program will need to stay running? Seconds? Days? Weeks?

That said, it's often helpful to keep in mind the KISS[1] and YAGNI[2] principles. Don't waste development effort on something when you don't yet know if it's even a problem.


[1] Keep It Simple, Stupid
[2] Ya Ain't Gonna Need It



EDIT: An alternative: Just keep the really small attributes inside the class:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
class Attribute {
        AttribType _type;
        int _length;
        short short_data;
        int int_data;
        float float_data;
        void *_data;
        Attribute(AttribType type, int length = 1) :
            _type(type),
            _length(length)
        {
            if (length == 1){
                switch (type){
                    case SHORT:
                        _data = &this->short_data;
                        break;
                    //...
                }
            }else
                _data = malloc(length * numBytes(type));
        }
Last edited on
Unfortunately a full, complete answer may be self contradictory, and begins with "it depends...."

If you're working on some example/student application in a reasonably modern computer, you have no worries. If performance is a serious concern, there are performance issues which you might not ever notice in the region of thousands and tens of thousands of allocations. The rumors you've heard are based in facts, but they are relative to high performance interests outside the scope of student level work.

That said, malloc and free do have several issues we like to avoid where we can. On a typical desktop or laptop you can see a steady several hundred thousand malloc's and free's per second.

Fragmentation is not the problem it is typically said to be, but it is an issue. There are many caveats, and modern computers somewhat automatically give you ways to simply ignore the problems in student/experimental level work. Here, again, we're talking about something which usually comes into play when you're pushing extremes. If your computer has 4 Gbytes of RAM (considered small these days), but your total allocation usage at maximum is in the region of a few hundred Mbytes of RAM, you'll likely never experience an issue (there are exceptions).

As you're a student there may be reasons you're using malloc related to study. If you're choosing malloc without good reason, and if you're using C++ instead of C, you're already in some trouble. Part of the point of C++, especially modern C++, is to avoid using malloc unless there's no other choice. 'new' and 'delete' are better choices for lots of little reasons, even though they are highly related, and there are even better choices involving C++11 to C++17 with options like make_unique, vector, dequeue and others.

I'll pause here because without knowing those contexts I can't really offer more than these hints of better options. It does no good for me to elaborate on the virtues of dequeue (an underused container BTW) if you were working in C or due to assignment constraints you have no choice.
Last edited on
An easy way to improve @helios's suggestion is to put all the small attributes into a union. After all, you would only store one of them at a time.

If you have many such small objects, it could be worthwhile to explore a small-buffer optimization.

If the pointed-to data is of a sufficiently small size, you can re-purpose storage ordinarily required for a size and pointer pair for the payload itself.

If you have C++17, std::any provides storage for any one object (and, in practice, the small-buffer optimization) for you already. If you do not, boost::any would be a suitable choice.
Last edited on
I wanted to avoid unions because I don't really like them, but since _data is a generic pointer anyway, probably an even better alternative could be
1
2
3
4
5
6
7
8
9
10
11
12
13
void *_data;
char inline_data[32]; //or some other small value

Attribute(AttribType type, int length = 1) :
    _type(type),
    _length(length)
{
    auto size = length * numBytes(type);
    if (size <= sizeof(inline_data))
        _data = &inline_data;
    else
        _data = malloc(length * numBytes(type));
}
Well, that buffer should handle alignment requirements, but otherwise, sure:
1
2
3
typename std::aligned_storage<
  32, alignof(std::max_align_t)
>::type inline_data;  
Last edited on
I've heard rumors about how memory allocation is slow and leads to memory fragmentation and other bad things.

Well, we can delve into the hardware side. When you allocate memory for a variable, you do so in your RAM. RAM doesn't have fragmentation issues or anything like that, it's very quick memory where the information is physically stored on it doesn't really affect how fast you get the info off it. It's constantly written on and isn't likely to fail from being written on "too much". IF you don't have enough RAM to supplement your memory allocation needs, then your operating system likely has virtual ram, which is space in your main disk drive that will be used as if it were RAM. This isn't very likely to happen though, even high RAM demanding video games rarely will need over 8gb of RAM (I have 16gb just to be safe). Tomb Raider is an exception, that game eats RAM like I'd eat a cheese cake. Mmmm...

Are you using an SSD or HDD? I assume an HDD, which are often used as scratch disks as needed. They use properties of magnetism to store information, and this can be done a lot of times. A hard drive is more likely to fail from being broken rather than because it's written too much! Memory fragmentation occurs when data for one thing is spread out over the hard drive, making the drive head spin all over to collect the needed data. Your operating system defrags the hard drive automatically. When used as RAM, the data likely wont be defraged, it's simulating RAM and will be deleted when not needed anymore.

An SSD, on the other hand, has a physical number (though usually very large) of times you can write to the drive. Never recommended as a scratch disk, you want that expensive SSD to live healthy until you upgrade. The more you write to an SSD, the slightly more the cells in it will slow and eventually fail - but that takes A LOT of writes for you to see a noticeable difference in performance. And there's no fragmentation on SSDs, the concept doesn't matter. All data can be accessed equally no matter their physical location on the drive. In fact, the information is usually spread out all over the cells in order to keep any particular cell from being written on too much and die faster than the other ones.

I don't know how malloc and such work behind the scenes, but an array stores information in an orderly fashion in your RAM. It's why an array can be used like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
int main()
{
	int *arr;
	int num;
	std::cout << "HOW LONG DO YOU WANT UR INT ARRAY?: ";
	std::cin >> num;

	arr = (int*)calloc(num, sizeof(int));

	for (int i = 0; i < num; i++)
		std::cout << (&arr+i) << '\n'; //Will Output Each Memory Space For The Array

	return 0;
}


Each element in the array will hold 4 bytes in your RAM, so the output of the program will be the memory location of each element, separated 4 bytes apart. It's in order, there's no fragmentation (or more like the term doesn't apply).

So overall, I wouldn't worry about it.

EDIT: Everyone else's answer seems to imply that there is some issue with malloc and such when dealing with memory, there might be something about this I'm missing?

Can I just malloc() a lot of tiny arrays and let the memory manager handle it, or should I be trying to group these together into larger pages?

The only issue I can imagine with RAM fragmentation is having your RAM almost full, but still with enough space for your program. The space, however, may end up not being physically in order on the drive itself, having free space here and there that were freed up, but other programs using blocks in-between.

In this case, there would still be no way around it yourself, since there's nothing you could do about that situation - the OS will be able to handle it better. I'm not sure how the operating system will handle such a situation (would the code above work? Or would part of the array be somewhere else in the RAM? Or would the OS simply not allocate any memory at all?), but I doubt anything will BREAK.

What you should know is that there are 1 Million kilobytes in a gigabyte - 1 Billion Bytes (around the same amount in a gibibyte if you wanna get technical about storage). You are very unlikely to run into any of these issues.
Last edited on
@zapshe

EDIT: Everyone else's answer seems to imply that there is some issue with malloc and such when dealing with memory, there might be something about this I'm missing?


I was one of those, and with the qualification that it isn't a real problem except under certain conditions, in agreement with your summary point that it isn't really an issue in modern hardware.

However, there is fragmentation of RAM. Like you said, when you reach certain limits that can become an issue, but it is difficult to come to that point. It isn't even (exactly) if the application consume all the RAM installed. While, as you point out, VM can extend that, it isn't always required to solve the problem. There was a point in 32 bit machines where applications like Photoshop could consume all available RAM where fragmentation could be a serious issue, but that was in part because all of the potential RAM space of the memory model would be consumed.

In that situation under Windows, the maximum standard amount of RAM that can be consumed by one application is 2Gbytes (with the 'big switch' one could arrange to extend that maximum to 3 GBytes). The maximum potential of the memory model, being 32 bits, was 4 GBytes. In Windows, however, RAM locations beyond 3Gbytes are not considered valid (as they are in Linux applications). The point is that even if the "holes" available between allocated memory blocks may add up to be sufficient to continue providing for new allocations, but they may not be large enough for a single contiguous allocation of significant size (like required for new layers of bitmaps in Photoshop). In other words, it's not just that RAM might be full, the potential to fashion new addresses to extend allocation would be full - you couldn't fit the size of that pointer into 32 bits.

Now, however, we're typically in 64 bit machines where the total maximum address space is so much higher (the limit is squared), so as a result those holes representing fragmented storage do not create this same exhaustion. You could still end up filling all RAM, and then all VM, and it up running out of space, but in theory you still have plenty of room in the formation of an address within the 64 bit value of a pointer. That is part of the fragmentation problem, and it all but evaporates for all practical purposes on 64 bit machines given present typical applications. No doubt some high end research simulations could exceed these limits.

What most people also don't think about is that modern libraries take some care to arrange allocations in a manner logical to the platform relative to page sizes. This means that the "holes" created by fragmentation are somewhat associated with page boundaries, which is the boundary of the memory mapping system. This is related to be quite separate from VM (though VM depends on it). The address one "sees" in a pointer received from an allocation is not the physical address in the hardware (at least not in modern operating systems). They are mapped, such that the physical location in the hardware may have little relevance to the location viewed by the application. This has two implications. One, there really are no guarantees that what appears to be contiguous blocks of RAM from the view of an application are in fact physically contiguous in hardware. They may be mapped in non contiguous locations much like parts of a file on disk storage, in blocks of the page size. The second implication, related to that point, is that when "holes" appear in the heap, this feature of the hardware (and some tricks of memory management in the library) can help minimize the impact of that fragmentation.


zapshe: Imagine that the address space had a size of 3 and was empty:

EEE
Now you allocate an object of size 1 and an object of size 2:
122
Now you release the object of size 2 and allocate one of size 1:
13E
Now you release object 1:
E3E
The address space is now fragmented. The free space is of size 2 but the largest object you can allocate is of size 1.

It doesn't matter if the OS implements virtual memory or not, what's fragmented is the address space, not necessarily RAM. The OS can move around bits in RAM all day long, but your program still thinks object 3 is at position 1 in the address space. The only way out of this situation is for your program to be smart enough to know how to reallocate objects. C and C++ programs generally do not implement this functionality and just crap themselves if memory happens to become fragmented, and instead opt to avoid the situation through various techniques, such as custom allocators. Languages with more advanced memory systems, such as Java and C#, actually implement object reallocation in the runtime environment.

This problem was much more common when 31- and 32-bit per-process address spaces were common, and it's way more rare with current 48-bit address spaces, but it can still happen with very long-running processes that do lots of heterogeneous allocations.
What would be an example of a large (several kilobyte) attribute?

(Is it a homogenous list of records?)
Last edited on
Thanks for the explanations Niccolo and Helios, that makes a lot more sense. With 2^64 of possible memory addresses, then there's a lot of wiggle room for fragmented memory addresses.

Thanks again, it was informative.

The address one "sees" in a pointer received from an allocation is not the physical address in the hardware (at least not in modern operating systems).

I wasn't sure myself, but I assumed so due to the dilemma that would be faced with arrays if memory wasn't actually allocated next to each other (or even on the same drive!). Good to know, thanks!
Last edited on
Ill address the other comment...
memory allocation and deallocation is slow when you do it nonstop in a tight loop. Try it! Stick a timer on a loop that does something simple like add 2 numbers a few billion times. Then change one of the variables to allocate and free a number every iteration. Just how much slower was it? Depends on the OS and many factors, but it will be significant. The whole loop may still only take a fraction of a second, but the actual time taken may have doubled or even 10xed. Then you ask yourself if you care that something that took a fraction of a second took 10 times longer. If you care, you look for a way to handle the memory better. If its good enough, you leave it alone.

'slow' is relative, in other words. Its a LOT slower to go to the os, ask for memory, OS finds next available block and marks it owned by you and gives it back... vs grabbing the next stack location which you already own and already know and don't involve the OS call. But 'a lot slower' is measured in cpu clock cycles, and there are over 3 billion of those per second on a modern machine. You can turn something that takes 200 clocks into something that takes 1000 clocks and you can barely measure the time taken either way.
Last edited on
A gigabyte is really REALLY big. Heck a megabyte is pretty big. As a beginner, it's unlikely that you're dealing with data that will fill even one gigabyte.

You're doing premature optimization. Unless you're sure that this issue will be a problem, don't bother worrying about it. Worry about making the code work.
Topic archived. No new replies allowed.