allocation speed difference between vectors and arrays

Pages: 12
Hello everyone! :D

One of our codes, which we will publish soon, works faster with arrays than with vectors.

So, we are starting to write a paper with the basis of a possible futuristic functionality, which has the purpose to allow arrays allocated with the new[] operator to be shrunk using the delete[] operator.

To know how many arrays and vectors can your computer allocate and initialize in one second, could be very useful for us to have more statistical data for this paper.

To help us by this way, requires three seconds to compute the results, some seconds to compile this code and some other seconds to post your results.

Thank you very much for your help, your comments, your time and your answers!

Best regards,

http://ncomputers.org

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#include<iostream>
#include<vector>
using namespace std;

int main(){
#define S 100
	unsigned int a=0,v=0,z=time(0)+1;
	while(time(0)<z);
	z++;
	while(time(0)<z){
		v++;
		delete new vector<unsigned int>(S,0);
	}
	z++;
	while(time(0)<z){
		a++;
		delete new unsigned int[S]();
	}
	cout<<"Total vectors allocated and initialized: "<<v*S<<endl;
	cout<<"Total arrays allocated and initialized: "<<a*S<<endl;
}
Last edited on
Our results:

Total vectors allocated and initialized: 560188400
Total arrays allocated and initialized: 1104438600

Processor: Intel Core i7 2nd Gen 2.1GHz
RAM: DDR3-1600

GNU Compiler -O3 flag
Last edited on
I think you've got a pretty naïve, biased test there.

But if all you need are a bunch of 100-element arrays, with no error checking and direct access, of course arrays are faster.

Show me some asymptotic analysis and some rigorous testing to prove... Wait, what are you trying to prove?

If all you want is a way to realloc() some memory, you do know that you can create custom allocators that do that, or even hook new and delete, right?
You aren't allocating 'groups of 100 vectors' or 'groups of 100 arrays' - you are allocating one with 100 elements. So your results are 100 times as large as they should be.

Also, you have a memory leak. When you are allocating an array with the new[] operator, you need to use delete[] operator to clean up.

Last edited on
He is deleting it. delete new ....

He's side-stepping the heap manager, which is one of the problems with his code.
I may still be getting something wrong here, but I'm referring to line 17 when he is using delete rather than delete[].
LOL, yep. I missed that.

That's actually undefined behavior.

(Personally, I think it's stupid to have to write [] on an array delete. The allocator/heap manager should know how to handle it properly for any pointer you give it. But, that's just my opinion.)
Last edited on
You're not even doing equivalent things. There is no need for new/delete in the vector loop.
Another of the problems with his code.

(There's quite a few more.)
Last edited on
One wonders if the OP is aware of the shrink_to_fit methods on some standard containers, notably std::deque and std::vector.
I think he has several issues in one post:

Directly allocating an manipulating an array is faster than using vectors (for small n)
true

Check out this sweet language modification idea: allow resizing of allocated arrays.
a common want (but not strictly necessary)

Ask for biased samples with given code, to be used to prove something.
needs rethink
Thank you all of you for your time, your comments and your help.

I'll answer first some of your comments and then I'll add information about this topic.

NT3:
You aren't allocating 'groups of 100 vectors' or 'groups of 100 arrays' - you are allocating one with 100 elements. So your results are 100 times as large as they should be.

Yes, it is true.

Also, you have a memory leak. When you are allocating an array with the new[] operator, you need to use delete[] operator to clean up.

It is true. On this case does that mistake probably give the same results. (GNU Compiler -O3 flag)

Cire:

You're not even doing equivalent things. There is no need for new/delete in the vector loop.

One of our codes requires an undefined number of arrays or vectors. Each array or vector "lives" for a variable amount of time until it "dies".

One wonders if the OP is aware of the shrink_to_fit methods on some standard containers, notably std::deque and std::vector.

So far we've researched, to shrink an array could be faster than to shrink a vector.

Duoas:

Directly allocating an manipulating an array is faster than using vectors (for small n)
true

This could be a third reason to suggest standard shrinkable arrays.

Check out this sweet language modification idea: allow resizing of allocated arrays.
a common want (but not strictly necessary)

Yes, it is a sweet language modification. It is possible, that I use wrong the shrink word. My English has mistakes! For shrinkable arrays I understand, arrays, which can be resized only to a minor size.

Additional information:

I've updated and published content on our website about this topic:

http://ncomputers.org/content/code.php?src=suggestions/shrink%20arrays%20cpp

We've two good reasons to suggest these shrinkable arrays, we're looking for a third, which could be what Duoas wrote:

Directly allocating an manipulating an array is faster than using vectors (for small n)
true

The more good reasons it has, the stronger it is.

Performance is the main reason:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
#include<cstdlib>
#include<iostream>
#include<time.h>
#include<vector>
using namespace std;

struct A{
	int a;
	A():a(rand()){}
};

void*operator new[](size_t s){
	return malloc(s);
}
void operator delete[](void*p){
	free(p);
}
void operator delete[](void*p,size_t s){
	if(realloc(p,s)!=p)throw 0;
}

int main(){
	unsigned int a=0,v=0,s,S=1024,z=time(0)+1;
	while(time(0)<z);
	z++;
	A*b;
	while(time(0)<z){
		b=new A[S]();
		for(s=S-1;s>0;--s)operator delete[](b,s*sizeof(A));
		delete[]b;
		a++;
	}
	z++;
	vector<A>*c;
	while(time(0)<z){
		c=new vector<A>(S);
		for(s=S-1;s>0;--s){
			c->resize(s);
			//C++11
			c->shrink_to_fit();
		}
		delete c;
		v++;
	}
	cout<<"Total arrays: "<<a<<endl;
	cout<<"Total vector: "<<v<<endl;
	return 0;
}


Thank you very much!

Best regards,

http://ncomputers.org/

(On Oct 9 I've edited this post, to express better my answers to your comments)
Last edited on
One of our codes requires an undefined number of arrays or vectors. Each array or vector "lives" for a variable amount of time until it "dies".
Still no need for allocation for vector itself. Store it by value, moving around if needed.
After changing your vector loop to do sensible operations I got:
1
2
Total vectors allocated and initialized: 1349713000
Total  arrays allocated and initialized: 1332870700
MiiNiPaa: Thank you very much for your answer and your time!

Still no need for allocation for vector itself. Store it by value, moving around if needed.
After changing your vector loop to do sensible operations I got:

I agree with you. In some cases it is possible to do it. Thank you very much to post your results.

In some other cases it is required to allocate the vector because:

The amount of vectors is undefined. It can vary, for example: In one second the code has 100, the next second 1000, the next second 10, and so on.

And it is also easier to construct and destruct the objects creating and deleting the entire vector. For example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
#include<iostream>
#include<vector>
using namespace std;

struct A{
	A(){cout<<"const"<<endl;}
	~A(){cout<<"dest"<<endl;}
};

int main(){
	vector<A>*a=new vector<A>(2);
	delete a;
	return 0;
}


Best regards,

http://ncomputers.org
In some other cases it is required to allocate the vector because:

The amount of vectors is undefined. It can vary, for example: In one second the code has 100, the next second 1000, the next second 10, and so on.

In that case we would be using a vector of vectors, not using raw pointers/new/delete + vectors, and in that case we would also not be comparing to a single new/delete of a dynamic array, which is... not equivalent.

And it is also easier to construct and destruct the objects creating and deleting the entire vector. For example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
#include<iostream>
#include<vector>
using namespace std;

struct A{
	A(){cout<<"const"<<endl;}
	~A(){cout<<"dest"<<endl;}
};

int main(){
	vector<A>*a=new vector<A>(2);
	delete a;
	return 0;
}


How is that easier than the following?
1
2
3
4
5
6
7
8
9
10
11
#include <iostream>
#include <vector>

struct A{
	A(){ std::cout<<"const\n"; }
	~A(){ std::cout<<"dest\n"; }
};

int main(){
    std::vector<A> a(2);
}



So far we've researched, to shrink an array could be faster than to shrink a vector.

Not shrinking either would be faster still. If you must keep track of size/capacity, there is no reason the two would not take the exact same amount of time (and since you seem very concerned with size vs capacity here, common sense says you will need to keep track of them.) The code you are timing for the dynamic arrays does not keep track of those things. Once again, we hit that equivalency issue, as the code you are timing for the vectors does keep track of those things.
Dear Cire:

Thank you very much for your time and your answer.

In that case we would be using a vector of vectors, not using raw pointers/new/delete + vectors, and in that case we would also not be comparing to a single new/delete of a dynamic array, which is... not equivalent.

Yes, it is possible to use a vector of vectors. So far my experience says, it will always be required to allocate vectors with the new operator, because the number of vectors is unknown during compile time.


How is that easier than the following?
1
2
3
4
5
6
7
8
9
10
11
#include <iostream>
#include <vector>

struct A{
	A(){ std::cout<<"const\n"; }
	~A(){ std::cout<<"dest\n"; }
};

int main(){
    std::vector<A> a(2);
}


The easiest way we know to have a runtime variable amount of vectors, which contain constructed objects, is using the new operator.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
#include<iostream>
#include<time.h>
#include<vector>
using namespace std;

struct A{
	static unsigned int a;
	static unsigned int b;
	A(){a++;}
	~A(){b++;}
};

unsigned int A::a=0;
unsigned int A::b=0;

int main(){
	unsigned int c,d,S=1024*1024;
	cout<<"How many vectors do you want?";
	cin>>c;
	srand(time(0));
	for(d=0;d<c;d++){
		new vector<A>(rand()%S);
	}
	cout<<"Total objects constructed: "<<A::a<<endl;
	if(A::b)cout<<"Error! Objects destructed"<<endl;
	return 0;
}

Not shrinking either would be faster still. If you must keep track of size/capacity, there is no reason the two would not take the exact same amount of time (and since you seem very concerned with size vs capacity here, common sense says you will need to keep track of them.) The code you are timing for the dynamic arrays does not keep track of those things. Once again, we hit that equivalency issue, as the code you are timing for the vectors does keep track of those things.

I agree with you, they are different.

When we started to treat this topic, we obtained a lot of antagonism about the suggestion of shrinkable arrays. The first answers of the useres were:

Use std::vector!

But vectors are for some purposes slower than arrays, due to what you just wrote.

When performance is the most important thing, vectors could be discarded in some cases, due to the possibility to obtain a better performance using native arrays or an own container implementation, developed for that specific code.

Best regards,

http://ncomputers.org
Last edited on
Yes, it is possible to use a vector of vectors. So far my experience says, it will always be required to allocate vectors with the new operator, because the number of vectors is unknown during compile time.

Vector does not allocates its element one by one. It allcoate memory in increasing chunks: allocating 10 times 100 elements is way faster than allocating 1000 times 1 element.

As vectors would lose some time on managing itself, you will lose time on managing array which contains pointers to "unknown numbers of arrays". If you want precise results, you should measure exactly what you want to achieve and not some subset of.

Optimization is a tricky subject. Optimizing a single function to run twice as fast can leave you with 10% speed decrease for entire program (said function decimated processor cache every time it runs).
MiiNiPaa:

Thank you very much for your time and your answer!

You wrote some knowledge, that I didn't know.

Yes, we want precise results.

The, code, which is above, is exactly what we want to achieve.

Create, shrink million of times arrays of sequentally aligned integers, which have random values and finally delete them.

The total number of arrays is variable during runtime.

I agree with you:

Optimization is a tricky subject.

Best regards,

http://ncomputers.org/
Last edited on
The, code, which is above, is exactly what we want to achieve.

Create, shrink million of times arrays of sequentally aligned integers, which have random values and finally delete them.
Code above does not do that. You need to store them in some accessible space because you intend to access them in your program.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
#include<iostream>
#include<chrono>
#include<vector>
#include <cstring>
#include <iomanip>
using namespace std;

struct arr
{
	unsigned* array;
	size_t size;
};

int main(){
	using clock_type = std::chrono::steady_clock;
	using tp = std::chrono::time_point<clock_type>;
	using dur_type = std::chrono::duration<double, std::ratio<1,1>>;
	std::cout << std::fixed << std::setprecision(6);
	unsigned v=1000000;
	tp start, finish;
	std::vector<std::vector<unsigned>> vectors;
	start = clock_type::now();
	while(--v) {
		vectors.emplace_back();
		vectors.back().reserve(40);
	}
	finish = clock_type::now();
	std::cout <<  std::chrono::duration_cast<dur_type>(finish - start).count() << '\n';
	v = 1000000;
	size_t size = 10;
	arr* arrays = static_cast<arr*>(malloc(size * sizeof(arr)));
	size_t used = 0;
	start = clock_type::now();
	while(--v){
		arrays[used++] = {new unsigned int[40], 40};
		if(used == size)
			arrays = static_cast<arr*>(realloc(arrays, sizeof(arr) * (size *= 1.5)));
	}
	finish = clock_type::now();
	std::cout <<  std::chrono::duration_cast<dur_type>(finish - start).count();
}
0.098944
0.091738
As you can see difference is not that large if we take into account operations that actually happens. But those tests do not matter. At all. Because you should always check perfomance in your real program. Optimizations will change your code.
Did you know that compiler can throw out line delete bew int; because it is not doing anything?

I am not arguing that corectly handled with regard to specific needs arrays will be faster. I just want to show that it will not be really noticeable.

And now I have a question: do you really need to sacriface convenience for speed? How much you expect to save with these optimisations? Is this really a bottleneck? Can I see profiler data? Do those optimisations interfere with something else? Why don't you rewrite speed critical parts in assembly?

BTW, when I dropped using overallocation for arrays (replacing size*= 1.5 by size += 20;) I got:
0.101299
13.107815
As you see, improper array handling is very dangerous.
Last edited on
std::time() and the clocks in std::chrono measure wall clock time.

std::clock() gives a measure of approximate processor time.

Using elapsed wall clock time as a measure of performance for these experiments could yield misleading results.

http://coliru.stacked-crooked.com/a/7fc30e3b429b1c17
Pages: 12