vector efficiency

Hi, I have a question concerning the performance of 2 blocks of code. The code I have questions about uses a non-standard library, SFML, which is a 2d graphic library. The code is quite simple, so I don't think I should elaborate on that.

The first piece of code is this :
1
2
3
4
5
6
7
8
9
10
11
if(TransporterOn){
do{
   CptCol = rand() % largeur; //Finding a height coordinate
   CptLig = rand() % hauteur; //Finding a width coordinate

}while(imagePersonnage.getPixel(CptCol,CptLig) == sf::Color(0,0,0,0) && pixelCpt != 6574);

imagePersonnage2.setPixel(CptCol,CptLig,imagePersonnage.getPixel(CptCol,CptLig));
imagePersonnage.setPixel(CptCol,CptLig,sf::Color(0,0,0,0));
pixelCpt++;
}


What this does is find randomly a pixel that is not black, then copies the pixel in another sprite and turns the pixel from the first image black. This simulates a teleportation from a Sprite to another Sprite.

This first piece of code takes, on a school computer 3.27 seconds to execute. I thought I would make it quicker by doing this :
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
//Preselect the non-back pixel and saves them into a vector of pair<int>
for(int i = 0; i < largeur; ++i){
	for(int j = 0; j < hauteur; ++j){
		if(imagePersonnage.getPixel(i,j) != Color(0,0,0)){
			myPixels.push_back(make_pair(i,j));
		}
	}
}
//randomize the vector
random_shuffle(myPixels.begin(), myPixels.end());

vector<int> vectorX;
vector<int> vectorY;

for(int i = 0; i < myPixels.size() - 1; ++i){
	vectorX.push_back(myPixels[i].first);
	vectorY.push_back(myPixels[i].second);
}


and then, I measure how much time this takes
1
2
3
4
5
6
7
if(TransporterOn){
if(pixelCpt < myPixels.size() - 1){
	imagePersonnage2.setPixel(vectorX[pixelCpt],vectorY[pixelCpt],imagePersonnage.getPixel(vectorX[pixelCpt],vectorY[pixelCpt]));
	imagePersonnage.setPixel(vectorX[pixelCpt],vectorY[pixelCpt],sf::Color(0,0,0,0));
	++pixelCpt;
}
}


The reason why I seperated the pair into 2 diffrent vectors instead of accessing the pair elements is simply to try to make it even quicker. It did not change anything.

Now, I thought my last piece of code would be A LOT quicker than using a random since the random finds a black pixel most of the time. BUT, it actually takes over 8 seconds.

So here is my question, how can the vector technique, which eliminates black pixels beforehand, takes more than twice the time the first piece of code ?
Why do you bother making vectorX and vectorY at all? You already have that data in the myPixels vector.

That said... if you know the final size of the vector, you should allocate space for it up front with reserve(). If you don't do this, as the vector grows it will [incorrectly] guess how much space it needs, then will reallocate/move the data when it exceeds that bounds. Depending on how large this image is, this could be happening like 5 or 6 times per vector (of which you have 3).

Also... per-pixel access in software is going to be slow. Moving images from the CPU to GPU is extremely time consuming. Moving pixel data from GPU to CPU is even slower. So it's likely that that's your bottleneck here. Consider doing these kinds of effects with shaders (GPU side software), rather than in your main program (CPU side software).

Lastly.... over 3 seconds seems ub]incredibly[/b] slow. How big is this image? And/or are you compiling with optimizations enabled? If you do not have optimizations on (ie: "Release" mode), then any times you have are meaningless. Rebuild with optimizations on and get new times.

So yeah... do the following:

1) Reserve space in myPixels up front:
1
2
3
4
myPixels.reserve(largeur * hauteur);

for( ... )
{ /* fill 'myPixels' with coords for non-black pixels */ }


2) Don't create separate vectorX and vectorY vectors. Just use myPixels.

3) Turn on optimizations, rebuild, get new times.

4) Better yet, use shaders rather than cpu pixel access.
Last edited on
I doubt that anybody's gonna answer that since the post is getting old, but I give it a try.

In answer to Disch :
I thought I was clear on a few things, such as :
I dont care the time it takes to fill the vector or to reallocate space, I do not consider that part of the code when I measure time.
I dont care about the time it takes to get a pixel and to set a pixel because I only want to compare, not to optimize.

Here are some details on my interrogation :
The piece of code that performs better is the one with the most "image.getpixel()". The code that uses random technique to find a pixel with color does so many getpixel() that I dont understand why lets say 100000 getpixel() is quicker than 6574 getpixel().

Im not an expert in statistics, and havent checked it in my program, but I think the loop with coordinates generated by random is executed something like 6574 multiplied by 3250 = 21 millions times!!!!!!!!!!!!.
some more details for that loop :
21 millions getpixels()
21 millions coordinates generated randomly
about 13 000 setpixels();

my other loop when coordinates already in memory is executed as many times as there are pixels, that is 6574.
some more details for that loop :
26000 acess to vector (4 by pixels : twice for coordinates, and once per image( 2 images)
6574 getpixel()
13000 setpixel()

how can 21 millions loop be quicker than 6574 ???? the only diffrence is that the 6574 one accesses vector elements.

Am I clear enough?
Once again I must say this:

are you compiling with optimizations enabled? If you do not have optimizations on (ie: "Release" mode), then any times you have are meaningless. Rebuild with optimizations on and get new times.


Emphasis on meaningless. If you do not have optimizations turned on, there is overhead involved in using a vector, since it's probably calling 3 or 4 functions every time you access an element, plus is doing bounds checking to make sure you don't step out of bounds... plus other debug stuff.

If you turn on optimizations all that goes away.

Don't waste time trying to profile a debug build. It's not worth it. Performance really only matters when optimizations are on. If you switch them on, I'm almost certain the vector version will be faster because it has fewer iterations.

Also, I question your 21 million iteration math... but whatever.
For equivalent functionality, if there is a significant difference in performance between c-style arrays and std::vector<>, complain to the compiler/library vendor. Loudly.

This would be fairly typical:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
#include <iostream>
#include <random>
#include <ctime>
#include <cstdlib>
#include <algorithm>

static constexpr std::size_t N = 1024*1024*8 ;

std::vector<double> random_sequence()
{
    static std::mt19937 twister( std::time(nullptr) ) ;
    static std::normal_distribution<double> distr( 100, 25 ) ;
    static std::vector<double> seq(N) ;

    for( double& d : seq ) d = distr(twister) ;
    return seq ;
}

struct cpu_timer
{
    ~cpu_timer()
    {
        auto end = std::clock() ;
        std::cout << double( end - begin ) / CLOCKS_PER_SEC << " secs.\n" ;
    };

    const std::clock_t begin = std::clock() ;
};

template < typename SEQ > 
void bench_mark( SEQ& seq, const std::vector<double>& data, const char* desc )
{
    std::cout << desc << ": " ;
    cpu_timer timer ;
    std::copy( std::begin(data), std::end(data), std::begin(seq) ) ;
    std::sort( std::begin(seq), std::end(seq) ) ;
    for( auto& v : seq ) v = ( 100 - v ) * v ;
    std::sort( std::begin(seq), std::end(seq) ) ;
}

int main()
{
    const auto rand_seq = random_sequence() ;

    {
         std::vector<double> vector(N) ;
         bench_mark( vector, rand_seq, "vector" ) ;
    }

    {
         static double carray[N] ;
         bench_mark( carray, rand_seq, "c style array" ) ;
    }
}

g++-4.8 -std=c++11 -O3 -march=core2 -Wall -pedantic-errors main.cpp && ./a.out
vector: 3.06 secs.
c style array: 3.01 secs.

http://coliru.stacked-crooked.com/a/59833c3cf5f59dc4
hi Disch, I am now working on the optimization thing, as of now I'm getting a D8016 error because of the optimization setting in debug mode.

I also checked the numbers I talked about, the 21 million one. Turns out the program acutally goes through the loop about 133 000 times, so I need to review my statistic estimates :-0.

This being said, I will get new times as soon as I can make the optimization work.
In most IDE's, turning on optimizations is as simple as switching a drop box from "Debug" to "Release"

What IDE are you using? Visual Studio?

EDIT:

In VS, just change this Drop Box to Release, then rebuild:
http://imgur.com/aJex24b
Last edited on
ok, I just reread your post, and noticed that I have to be in release mode. The reason I thought I had to be in Debug mode is because I was already in release mode when I first got my times.

Nevertheless, I found why my times were messed up. When I filled the vector, my condition for pixels was diffrent than the one in the random loop :
Color(0,0,0) instead of Color(0,0,0,0)
That caused the vector to be larger than it should've been, almost 3 times larger.

After corrections, I get these times :
Random technique : 3.34 seconds
Vector technique : 2.93 seconds

The fact that times are very close to one another is probably because most of the time is actually spent for the displaying of the image, and not for accessing elements or generating random coordinates. I draw and display for every pixel.

Also, my loop counter changed for some reason (!?!?). It now tells me the program goes through the random loop about 215 000 times. On the other hand, there are 6574 elements in my vector.

Anyways, it answers my original question about the vector efficiency.

thanks Dische
For further analysis, if I transfer all pixels without displaying them, the difference is much more important :
vector technique : 0.00016 seconds
random technique : 0.009 seconds

Cheers
Topic archived. No new replies allowed.