Array/Vector access slow down

I recently converted a single *.cpp project file into two *.h files with corresponding *.cpp files(1 class each), and a main.cpp. I did not significantly change any of the functions.

In this project I am modeling an object moving through a 2-d space, and have a history variable that records the position every step, along with a "score" of each position. I have a function that runs each step which returns a change in the score within a recent interval (which requires reading the history multiple times).

The new code runs from 3x-6x times slower than the old code (depending on if history is an array or a vector). I have narrowed the slowness down to the functions I described in the preceding paragraph (as in when I comment out those function calls, the program runs at the original speed). So, it seems that the reading of the history variable is what is causing the slowdown. So is there something about classes and separate files that makes reading an array/vector slower?

history init:
1
2
//vector< vector<double> > history(60000, vector<double>(4));
double history[60000][4];


functions:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
double KK::nInterval(int index){
	double sum=0;
	
	for(int i=index ; i>index-stepsN ; i--){
		sum += history[i][3];
	}
	//do not need to worry about dividing by zero here, taken care of elsewhere
	return sum / stepsN;		
}

double KK::mInterval(int index){
	double sum=0;
	
	for (int i=index-stepsN ; i>index-(stepsN+stepsM) ; i--){
		sum += history[i][3];
	}
	//do not need to worry about dividing by zero here, taken care of elsewhere
	return sum / stepsM;	
}

double KK::difference(int index){
	
	if (instantaneousDC){
		if (index > 0)
			return (history[index][3] - history[index-1][3]) / dt; //concen difference per second
		else
			return history[index][3] / dt;
	}
	else{
		if (index+1 >= stepsN+stepsM)
			return (nInterval(index) - mInterval(index)) / dt ; //concen difference per second
		else 
			return 0.0;	
	}
}


Thanks for any help,
Marc Dillon
So, it seems that the reading of the history variable is what is causing the slowdown.
I'm going to assume that you've come to this conclusion because those variables are only referenced in the above methods.

So is there something about classes and separate files that makes reading an array/vector slower?
There wouldn't be enough overhead to cause any sort of noticeable difference in speed. Just to double-check, I suggest making copies of your current files and attempting to revert back to the code you had before this, and comparing the difference. You might end up with the same results that you're getting now.

Additionally, can you give us some realistic examples of how far those for loops might be going? What's the "speed" that you were going at before, and what is it at now?
This might have something with cache coherency. When you allocate a 2D array it will be in a single piece of memory. When you allocate a vector of vectors, each subvector will call a new for it's piece of data so these vectors may end up in very different addresses. Try replacing the inner std::vector with an std::array or a structure containing 4 doubles. In this case the new operator will be called only once for the outer vector and the data will again be in a single contiguous piece of memory. If this indeed helps - then it's the cache-related problem. If not - follow NGen's recommendations (actually, follow them anyway :))
I'm going to assume that you've come to this conclusion because those variables are only referenced in the above methods.

I came to this conclusion because I have my program spitting out time elapsed values, and when I comment out the function calls to these functions, the time goes down, whereas nothing else that I tried to temporarily remove affected the time as much.

There wouldn't be enough overhead to cause any sort of noticeable difference in speed. Just to double-check, I suggest making copies of your current files and attempting to revert back to the code you had before this, and comparing the difference. You might end up with the same results that you're getting now.

I still have the original code, which is how I came up with the '3x-6x slower' numbers. So, unfortunately it does not seem to be just a fluke of the moment.

Additionally, can you give us some realistic examples of how far those for loops might be going? What's the "speed" that you were going at before, and what is it at now?

The for loops are running for 120 and 430 iterations respectively. The speed of a single run through the function is not enough to record (at least not how I'm doing it with clock()), so I'm recording the speed of a single "trial" in the program, which is 60000 iterations of everything including those functions. These trials last .06 seconds in the old code, and .4 seconds in the new code when history is a vector, and .18 seconds in the new code when history is an array.


This might have something with cache coherency. When you allocate a 2D array it will be in a single piece of memory. When you allocate a vector of vectors, each subvector will call a new for it's piece of data so these vectors may end up in very different addresses. Try replacing the inner std::vector with an std::array or a structure containing 4 doubles. In this case the new operator will be called only once for the outer vector and the data will again be in a single contiguous piece of memory. If this indeed helps - then it's the cache-related problem. If not - follow NGen's recommendations (actually, follow them anyway :))

This is very interesting, I've never thought of making a vector of arrays. However my problem right now is not the speed difference between vectors and array, but between the old single-file code, and the new code with multiple files and classes. I also don't know how to create a vector of arrays like that. vector< double [4] > didn't work, vector< std::array<double, 4> > spits out that array isn't part of std, and vector< array<double, 4> > says array is not defined.

In the meantime I changed the interval functions to only sample every 10th entry in the history, without loss of performance.

Thanks for the responses,
Marc Dillon
It sounds like some optimization opportunities were lost by moving history to a separate translation unit in your case.

I've tried to replicate with gcc on linux and with xlc on aix, by I got identical results in all four cases:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
#include <iostream>
#include <vector>
#include <array>

// 1.1 std::vector<std::array<double, 4>> history(600000);
// 1.2 extern std::vector<std::array<double, 4>> history;
// 2.1 double history[600000][4];
// 2.2 extern double history[600000][4];

volatile double sink;
int main()
{
  int index = 599999;
for(int repeat = 0; repeat < 100000; ++repeat)
{
  for(int stepsN = 0; stepsN < 100; ++stepsN)
  {
    double sum=0;
    for(int i=index ; i>index-stepsN ; i--){
        sum += history[i][3];
    }
    sink = sum;
  }
 }
}
user time, average of 3
GCC 4.6:
1.1: 0.979s
1.2: 0.978s
2.1: 0.975s
2.2: 0.973s

XLC 11
1.1: 0.398s
1.2: 0.399s
2.1: 0.432s
2.2: 0.432s


If you could construct an isolated testcase, it would be easier to take a look across different compilers/platforms to see if there is something systemic.

(edit, increasing stepsN to a more sensible value didn't change the lack of the difference between same file and separate file array locations in my tests)
Last edited on
Thanks again for the responses. I have figured out the problem, which, like most of the time, is something very small. When splitting up the code I created a new makefile. In the new makefile I forgot to add the -O3 optimization option to the gcc commands. The really good new is, it forced me to figure out a way to optimize my code a bit that I hadn't thought of before, and now it is even faster than before.

Marc
Last edited on
@ KRAkatau: Vectors are contiguous, and believe it or not this actually makes their performance worse then what you described. In order to be both dynamic in size and linear in build, a Vector will actually move every element inside itself into a new block of memory when it needs to grow beyond it's currently allocated memory range.
Topic archived. No new replies allowed.