How does vector work?

Forum

Forum
General C++ Programming
How does vector work?

How does vector work?

For older compilers that do not support standard template library, using vectors is not an option.

So if I want to make a dynamic array, I would have to use pointers and dynamic memory allocation.

But how should I manage my dynamic array?

Suppose I have a dynamic array of size 3 and want to add a new element to give me a size 4, what do I do? Do I make a new array with 4 elements, delete the old array and copy the old array into the new array?

Or is there a way to just add a new index at the back of the existing dynamic array (I hope so because the other approach seems inefficient).

How does vector do it?

Out of topic: I'm asking this because I want to make a RNG generating function that takes input of lower_limit and upper_limit and can produce numbers higher than RAND_MAX in Turbo C++.

To do so I need to store the individual digits of the numbers. But since I do not know how big the numbers are, I would have to make use of dynamic arrays.

If there was a way to add a new index to an already existing dynamic array then that would mean I could iterate over the input and directly assign the digits to a new array.

Otherwise I would have to first iterate over the input to find the number of digits and then make a dynamic array with that many indexes and then proceed to fill the indexes.

helios (17506)

A vector has two sizes. One is the number of elements it contains (returned by size()), and the other is the size of its internal array (returned by capacity()). size() <= capacity() is always true. When you push_back() an element, std::vector checks whether size is equal to capacity. If not, it just puts the element at the end and increments the size. If they are equal, the vector allocates a new internal array of capacity old_capacity * k (typically 1 < k <= 2), copies or moves the old elements to the new array, then deallocates the old array. It then proceeds with the normal push_back() behavior.

I want to make a RNG generating function that takes input of lower_limit and upper_limit and can produce numbers higher than RAND_MAX in Turbo C++.

To do so I need to store the individual digits of the numbers.

Do you want the numbers to be only larger than RAND_MAX, or do you want the to be arbitrarily large? They're different problems.

If you want arbitrarily large numbers I'd suggest just using GMP.
https://gmplib.org/

Why do you want arbitrarily large random numbers? That's an unusual problem.

Grime (651)

Rand_max is defined as 32767 in our implementation of turbo c++ (we use turbo c++ in school).

So I was inspired by that to make a function that could generate numbers larger than the limit. I'm doing this as practice not for practical purpose.

Just thought I would make my function be able to generate larger numbers because why not! Might even save you a few seconds here and there!

Grime (651)

So increase the size by twice and push all elements is it? Then vector must be very bad for storing a lot of data right what's the catch?

Sorry this is in a new reply.. The forums reply interface on my phone is glitchy..

helios (17506)

You could just call rand() repeatedly. 32767 is 15 bits of randomness. If you call rand() three times, you can get 45 bits. That's more than enough for 32-bit integers.

std::uint32_t rand32(){
    static_assert(RAND_MAX == 32767);
    return rand() | ((std::uint32_t)rand() << 15) | ((std::uint32_t)rand() << 30);
}

coder777 (8439)

So increase the size by twice and push all elements is it?

It allocates the memory according to the capacity and it moves the elements rather than copy. So reallocation is only necessary when exceeding the capacity. See:

http://www.cplusplus.com/reference/vector/vector/?kw=vector

For realloc(...) see:

http://www.cplusplus.com/reference/cstdlib/realloc/?kw=realloc

helios (17506)

Then vector must be very bad for storing a lot of data right what's the catch?

The catch is that increasing the capacity in exponential steps causes push_back() to take amortized linear time over the size(), rather than just linear time. It would be pretty bad if push_back() took linear time; for example, to push a million elements into a vector, it would take something in the order of a trillion operations.

std::vector is indeed not the most memory-efficient data structure if you have no idea how many elements you'll need to store, and it becomes increasingly wasteful (in absolute terms) the more elements you push. However, if you can take a guess at how much space you need, its performance can be dramatically improved by calling reserve().
For sufficiently long sequences of unpredictable length, std::deque may be preferable. These are very usual, though.

Grime (651)

Why the heck don't references mention prototypes for functions? *madface*

If static_assert belongs to <cassert> then it seems Visual Studio 2017 doesn't support that overload for static_assert (only one argument). There needs to be a second parameter that is the message. But Visual Studio 2017 supprts C++17 right.. hmm?

helios thanks a lot for the reply! I never saw rand() in that way! And I would have never been able to come up with that! Awesome!

And thanks a lot coder777 realloc() is precisely what I wanted and what I didn't know about!
But I'm just curious about the back end working.. what happens if extending a block means entering the territory of another allocated variable??

I mean at one point, eventually, you will have a large enough size that you simply cannot extend further because you would be corrupting other data. We're talking about manual realloc() now, not vectors.

What happens then? Does the compiler throw a stack overflow and stop or something? Or like does it reallocate everything farther apart? But then the pointers are still pointing to the same addresses, this is confusing..

jonnin (11331)

its your code, you can attack your specific problem directly.
say you allocated 3 and get a 4th. then you reallocate 100. then you get your 101st and you reallocate 200. the exact algorithm you use to guess how many to preallocate each time can be complicated (it can do stuff like track how many inserts you get vs deletes and all that and predict) or simple (it can allocate twice the current size every time, but it can very quickly drain the machine of memory) or linear (add 100, or 1000, or a million, whatever, each time you reallocate).

as for reallocation ... you can use the C memory stuff or you can make a new pointer with the extended size and copy the old data over into it, then destroy the old smaller buffer (this is what realloc does, but if you need to use new/delete instead, you do it this way). If your buffers do not need to be continuous memory like vectors or arrays, you can even just do a convoluted linked list type memory allocation and chain the blocks together as you create them (this is a bit of a mess, but its doable). Do you need solid blocks of memory? (this is just like a linked list, except each node contains a pointer full of data). It can even be efficient to do that, if you search or process each node of the chains in a thread :)

All that to say: be creative if you want to solve a specific problem or class of problems. Use known datastructures and patterns if you want a generic tool. Figure out what you want, then build it, in other words. <vector> is not that complicated, it is just hiding the pointer junk from you, storing the size and type info under the hood (the type is not explicitly stored, but its in the c++ infrastructure) and providing some interfaces into the data. It could be recreated in a day or two by even a student, I believe, if the person understands pointers pretty well.

read this: https://en.cppreference.com/w/c/memory/realloc
your question about realloc is tied to the second approach on how the function works. see a, b parts and part b in particular.

keep this in mind:
allocating, releasing, and copying are all expensive operations. Your tool should minimize these things as best it can for your problems, which is where you may want to deviate from the vector approach and cook your own solution based off how you think you will use this tool.

Last edited on

Grime (651)

Wow jonnin I'm impressed! That was quite helpful!

I totally skipped past the "if possible" clause in the first link to the reference. Stupid me :P.
I just realized how much I am still yet to learn. And this topic and your reply just made data structures that much more interesting. So thanks really to everybody, this forum has taught me a lot. And also reminded me that I need to sit and read C++ properly because there are soooo many things ;D!

JLBorges (13770)

> We're talking about manual realloc() now, not vectors.

No. Really, no.
We can't use std::realloc() to resize arrays of many (most) C++ types (eg. array of std::string).

Because reallocation may involve bytewise copying (regardless of whether it's to expand or to contract), only the objects of TriviallyCopyable types are safe to access in the preserved part of the memory block after a call to realloc
https://en.cppreference.com/w/cpp/memory/c/realloc#Notes

TriviallyCopyable: https://en.cppreference.com/w/cpp/named_req/TriviallyCopyable

Grime (651)

JLBorges realloc() works for standard datatypes like char and int though right?

What if I did have to realloc() for array of strings??

JLBorges (13770)

> Can't realloc() pointers pointing to string?

Can't realloc() (that is, shouldn't realloc()) arrays of std::string.
std::string is not a TriviallyCopyable type.

#include <iostream>
#include <string>
#include <type_traits>

int main()
{
    std::cout << std::boolalpha << "is trivially copyable? "
              << std::is_trivially_copyable<std::string>::value << '\n' ; // false
}

http://coliru.stacked-crooked.com/a/f0defa0de3d8e308

We can realloc() arrays of pointers to std::string.
Object pointers are TriviallyCopyable types.

> realloc() works for standard datatypes like char and int though right?

Yes, it does. These types are TriviallyCopyable

Last edited on

keskiverto (10363)

older compilers ... Turbo C++

What is the rationale behind the use of old tools that do not support C++ "as we know it today"?

Cubbi (4774)

it doesn't support C++ as we knew it 20 years ago (in 1998) either.. but as OP said, "we use turbo c++ in school", so it's really a question to that school's management.

sp356069 (6)

std::vector allocates a memory on heap and store all its elements in contiguous memory location.

But what if memory it allocated initially is completely filled?
For example, let’s create a vector of ints i.e. std::vector<int> . Now suppose its initial capacity is to store 10 elements, but in our application we want to store 15 elements in it. Then what will happen when we will insert 11th element?
When std::vector’s internal memory completely finishes then it increases the size of its memory. To do that it performs following steps,
1.) It will allocate a bigger chunk of memory on heap i.e. almost double the size of previously allocated.
2.) Then it copies all the elements from old memory location to new one. Yes it copies them, so in case our elements are user defined objects then their copy constructor will be called. Which makes this step quite heavy in terms of speed.
3.) Then after successful copying it deletes the old memory.
You can check the current capacity of vector i.e. how much elements it can store in current allocated memory using capacity() member function.
To check the count of currently stored elements in std::vector one can use size() member function.

jonnin (11331)

What is the rationale behind the use of old tools that do not support C++ "as we know it today"?

outside of school, some embedded systems are a bit behind or are using a derived 'c++ like' language that may not have STL, recursion, or the like.

Topic archived. No new replies allowed.