Why I hate std::vector

Aside from the word vector being a misnomer, I have issues with both std::vector and std::string, especially in beginner programmers' uses.

Recently, I've spent almost three weeks of nothing but memory management hell basically because people misused this seemingly harmless container.

Let's take an example:
1
2
3
4
5
6
7
int main()
{
    std::vector<int> myVector;
    
    myVector.push_back(1);
    myVector.push_back(2);
}

Harmless right? Wrong. The user here knows they needed only two integers. Because of how a vector works, instead of making one large chunk of memory with enough space for two integers, it instead allocates a small block of memory big enough for only one, then reallocates a second chunk of memory that no longer fits in the first block of memory, potentially causing fragmentation. Let's look into how this works exactly by implementing a dummy allocator.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
#include <vector>
#include <iostream>

template <typename T>
class DummyAllocator {
public:
    typedef T value_type;

    typedef T* pointer;
    typedef const T* const_pointer;

    typedef T& reference;
    typedef const T& const_reference;

    /* Because this makes sense... */
    template <class U>
    struct rebind {
        typedef DummyAllocator<U> other;
    };

    pointer allocate(std::size_t num, const void* = 0)
    {
        std::cout << "Allocating " << num << 
            " of type " << typeid(T).name() << "\n";
        return new T[num];
    }

    void deallocate(pointer p, std::size_t num)
    {
        std::cout << "Deallocating " << num << 
            " of type " << typeid(T).name() << "\n";
        delete [] p;
    }

    void construct(pointer p, const_reference value) {
        new ((void*)p) T(value);
    }

    void destroy(pointer p) {
        p->~T();
    }
};

int main()
{
    std::vector<int, DummyAllocator<int> > myVector;
    
    myVector.push_back(1);
    myVector.push_back(2);
}
Allocating 1 of type i
Allocating 2 of type i
Deallocating 1 of type i
Deallocating 2 of type i

So, why do people do this? Because it's convenient for the user, despite it being such a ridiculous thing to do. Picture this being done on a scale of kilobytes or megabytes... and you have tons of fragmentation that isn't particularly easy to fix.

I will at least say that std::vector gives you the power to prevent this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
#include <vector>
#include <iostream>

template <typename T>
class DummyAllocator {
public:
    typedef T value_type;

    typedef T* pointer;
    typedef const T* const_pointer;

    typedef T& reference;
    typedef const T& const_reference;

    /* Because this makes sense... */
    template <class U>
    struct rebind {
        typedef DummyAllocator<U> other;
    };

    pointer allocate(std::size_t num, const void* = 0)
    {
        std::cout << "Allocating " << num << 
            " of type " << typeid(T).name() << "\n";
        return new T[num];
    }

    void deallocate(pointer p, std::size_t num)
    {
        std::cout << "Deallocating " << num << 
            " of type " << typeid(T).name() << "\n";
        delete [] p;
    }

    void construct(pointer p, const_reference value) {
        new ((void*)p) T(value);
    }

    void destroy(pointer p) {
        p->~T();
    }
};

int main()
{
    std::vector<int, DummyAllocator<int> > myVector;
    
    myVector.reserve(2);    

    myVector.push_back(1);
    myVector.push_back(2);
}
Allocating 2 of type i
Deallocating 2 of type i

I sadly do not see this being done... ever. Rather, in most cases, a linked list should have been where you see a lot of push_back calls which is more forgiving with fragmentation as it doesn't require a contiguous amount of memory and is generally faster with inserting objects.

std::string is the same way. If you have a string that simply keeps expanding, you *must* remember to reserve memory for it. You do not want to keep reallocating memory from smaller to bigger. That's a hack at best in desperate situations where you know you're taking a loss. If you do this in games or large server applications, you're going to be looking at several hundred megabytes (possibly gigabytes) of memory lost to fragmentation

Note that most of this is directed towards larger applications that allocate more than just a few megabytes on the heap and further discussion should probably be on how to avoid fragmentation. Still, it's generally good practice and there's little effort into doing it right *the first time*. Please... just use .reserve() or don't use a std::vector or std::string at all.
Last edited on
Don't drink and drive.

Don't operate complex machinery without training.

I wonder why many cars are manual whena a learner will be better with an automatic.

Learn data structures before coding.

If you can't be bothered there's Ruby.
Last edited on
NoX's original post can be summed up with "push_back is over-advertised and isn't really that useful". I agree.

There's nothing wrong with vector/string on their own. The problem is that they can be used sub-optimally. And yes that kind of sucks, but the consequences of using vector/string incorrectly are much less severe than using a basic structure or manual memory management incorrectly.

In the end, it's a tradeoff. And I think it's worth it. We're certainly better off having vector/string than not having them.


I wonder why many cars are manual whena a learner will be better with an automatic.


Manual transmissions get better mileage and give you more control over the car. I hate driving automatics... they always feel slippery. Like I let go of the gas and the car almost feels like it keeps accelerating.
Last edited on
"Let me magically allocate memory for you" is how I look at this feature. It's designed to make people forget to manage the underlying memory because it's an inconvenience. That's really the only point of a vector over an array. Okay for very small applications but even medium sized applications should not be using a vector. A std::string is slightly different as its purpose isn't just to constantly resize but people abuse the shit out of it by forgetting that it's internally allocating every time you change the string size. That's not something you can forget! This is definitely one of those features where people hate on it because it does something behind your back that isn't very clear, but even more so, it's a feature that can cause severe issues! This isn't even a bad mix of a low-level language with a high-level mechanic, even Java has to deal things like fragmentation.

There should never be a time where using an std::vector leads to allowing it to allocate memory automatically because the vector lacks in size to handle the new element you wish to add. Thus using a vector in general is bad practice in my eyes.

I will admit that std::vector does actually track the number of elements currently used rather than just size of the whole array which is why push_back is possible. If I trusted people to use vector correctly that didn't lead to complete freakin' mess, I wouldn't be so against its use as it can be useful. But nobody ever uses it for its useful purposes. It's only used for its ability to resize an array of memory which is inherently bad in general and something to be avoided in the first place.
"Let me magically allocate memory for you" is how I look at this feature. It's designed to make people forget to manage the underlying memory because it's an inconvenience.

That's how it is for people too lazy to read documentation.

There should never be a time where using an std::vector leads to allowing it to allocate memory automatically because the vector lacks in size to handle the new element you wish to add.

Go look at boost::static_vector, it's literially just std::vector with a custom allocator. Now if it ever needs to resize outside of the continous memory you'll just get an exception thrown.
closed account (z05DSL3A)
It's two in the morning here and I'm a bit tired.

I'm not sure that I have a point to make yet but I think the original code needs expanding for a bit more clarity.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
#include <vector>
#include <iostream>

template <typename T>
class DummyAllocator {
public:
    typedef T value_type;
    
    typedef T* pointer;
    typedef const T* const_pointer;
    
    typedef T& reference;
    typedef const T& const_reference;
    
    /* Because this makes sense... */
    template <class U>
    struct rebind {
        typedef DummyAllocator<U> other;
    };
    
    pointer allocate(std::size_t num, const void* = 0)
    {
        std::cout << "Allocating " << num <<
        " of type " << typeid(T).name() << "\n";
        return new T[num];
    }
    
    void deallocate(pointer p, std::size_t num)
    {
        std::cout << "Deallocating " << num <<
        " of type " << typeid(T).name() << "\n";
        delete [] p;
    }
    
    void construct(pointer p, const_reference value) {
        new ((void*)p) T(value);
    }
    
    void destroy(pointer p) {
        p->~T();
    }
};

int main()
{
    std::vector< int, DummyAllocator<int> > ivec;
    
    std::cout << "ivec: size: " << ivec.size()
    << " capacity: "  << ivec.capacity() << std::endl;
    
    for ( int ix = 0; ix < 24; ++ix )
    {
        ivec.push_back( ix );
        std::cout << "ivec: size: " << ivec.size()
        << " capacity: "  << ivec.capacity() << std::endl;
    }    
}
ivec: size: 0 capacity: 0
Allocating 1 of type i
ivec: size: 1 capacity: 1
Allocating 2 of type i
Deallocating 1 of type i
ivec: size: 2 capacity: 2
Allocating 4 of type i
Deallocating 2 of type i
ivec: size: 3 capacity: 4
ivec: size: 4 capacity: 4
Allocating 8 of type i
Deallocating 4 of type i
ivec: size: 5 capacity: 8
ivec: size: 6 capacity: 8
ivec: size: 7 capacity: 8
ivec: size: 8 capacity: 8
Allocating 16 of type i
Deallocating 8 of type i
ivec: size: 9 capacity: 16
ivec: size: 10 capacity: 16
ivec: size: 11 capacity: 16
ivec: size: 12 capacity: 16
ivec: size: 13 capacity: 16
ivec: size: 14 capacity: 16
ivec: size: 15 capacity: 16
ivec: size: 16 capacity: 16
Allocating 32 of type i
Deallocating 16 of type i
ivec: size: 17 capacity: 32
ivec: size: 18 capacity: 32
ivec: size: 19 capacity: 32
ivec: size: 20 capacity: 32
ivec: size: 21 capacity: 32
ivec: size: 22 capacity: 32
ivec: size: 23 capacity: 32
ivec: size: 24 capacity: 32
Deallocating 32 of type i
Just to add to the list of reasons why abusing vector can cause major problems:

Many times, people use vectors and they pass around &vec[0] without concern for the fact that the location of the underlying array in memory is subject to change.

With this and your point about fragmentation and performance, you have a pretty strong argument for why beginners should start out with manual memory management before using things like vectors.





Last edited on
Vector allows beginners to focus on logic and structure rather than wrestle with low level details, at the same time giving experts a high performance, highly customizable, dynamic array, with the full spectrum of STL algorithms at its disposal.

Memory fragmentation is a shortcoming of the system allocator, not vector. Vector simply automates reallocation and deletion; it would be more appropriate to hate malloc().

As for hidden behavior - how much memory does it take to store an integer in a linked list accounting for node data and alignment? What's the running time for list.size() and list.splice()? Are nodes pooled or chunked, allocated piecemeal, prefetched? Etc. Vector is easily the most predictable of the bunch.

Vector emulates a core language construct from a safer, more accessible package, and should be the goto container unless there is a good reason to use another.

From "The C++ Programming Language Special Edition," "By default, a vector should be used; it will be implemented to perform well over a wide range of uses." (Stroustrup 462)
Last edited on
Memory Fragmentation is a shortcoming of heap allocation, which applies to virtually all system allocators in modern day. C++ is a language that requires that you know about the system allocator. There's not many alternatives. Even a garbage collector/pool allocation still has to deal with fragmentation. In this particular case, I experienced ridiculous fragmentation with jemalloc which allocated everything in pages and is known to be one of the better heap allocators to prevent fragmentation.

It doesn't matter if std::vector is predictable if nobody pays attention to what it does internally.

Also, C++ isn't a beginner language. It is a low-level language. While it does contain mechanisms for high-level constructs, that doesn't mean you can simply ignore the low-level aspects of the language or you get ridiculousness like what I've experienced.

I do not care what Stroustrup says. He also says that in the context of a competent C++ user.
If there's bad practice, check it in review.
This is an example of something that drives me nuts: telling programmers about the benefits of feature XYZ without also telling them about the costs, problems, drawbacks or limitations. Pretty soon XYZ becomes dogma and it's nearly impossible to convince someone that it sometimes doesn't apply.

At least std::vector<> allocates proportional to the size. We used to use RogueWave collections and their vector (called RWOrdered) would grow by 25 items. If you attemped to put 200,000 items into one, it would take until the next century before it finished.
Topic archived. No new replies allowed.