I created a class that acts just like a set but uses OpenMP to run in parallel to make everything faster, your thoughts?

Pages: 12
1. Did you even compile benchmarks/test_speed? The code in the repository doesn't compile. srandom() and random() are not standard functions.
2. The first time I ran it your code didn't even get to run. It crashed because numTrees == 1. in several places you do things like allData[numTrees - 2] without checking if that's a valid operation.
3. Your benchmark code needs some serious reorganization. Please put every test in its own function, don't just shove all the code into main(). It's a pain to follow the data flow like this.
4. Some benchmarks are invalid because you're not really doing anything with the results. The compiler can optimize away code whose results are discarded.
5. rebalance_trees() is incorrect. When you erase() from an std::set (for example, here: https://github.com/AvniOsmani/ParallelSet/blob/master/parallelSetClass.h#L93 ), the iterator you erased becomes invalidated. You can't then increment that iterator like you do in the for header. I rewrote those sections into this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
{
    auto &set = allData[0];
    auto b = set.begin(), e = set.end();
    for (auto i = b; i != e;){
        if (*i <= rightEndPoints[0]){
            ++i;
            continue;
        }
        insert_element(*i);
        set.erase(i++);
    }
}

for(int i = 1; i < numThreads - 1; ++i){
    auto &set = allData[i];
    auto b = set.begin(), e = set.end();
    for (auto j = b; j != e;){
        if (*j > rightEndPoints[i - 1] || *j <= rightEndPoints[i]){
            ++j;
            continue;
        }
        insert_element(*j);
        set.erase(j++);
    }
}

{
    assert(numThreads >= 2);
    auto &set = allData[numThreads - 1];
    auto b = set.begin(), e = set.end();
    for (auto i = b; i != e;){
        if (*i > rightEndPoints[numThreads - 2]){
            ++i;
            continue;
        }
        insert_element(*i);
        set.erase(i++);
    }
}

6. Again, rebalance_trees() is incorrect. I added this line to testData initialization:
1
2
3
4
5
for(int i=0; i<numOfTestSubjects; ++i){
    auto x = x1 + ( x2 - x1) * (rand() % max_rand) / max_rand;
    testData.push_back(x);
    testData.push_back(1); //this
}
and find_parallel() started returning incorrect results. I believe it's because rightEndPoints is incorrectly estimated between here
https://github.com/AvniOsmani/ParallelSet/blob/master/parallelSetClass.h#L69
and here
https://github.com/AvniOsmani/ParallelSet/blob/master/parallelSetClass.h#L86
7. This is less important, but your number generation code is incorrect. It's generating numbers between -1000 and 0, not between -1000 and 1000.

That's all I have for now, I'll continue looking at this tomorrow. I still don't quite understand what's happening with the timing; when I commented out the call to rebalance_trees() in insert_parallel(), the measured time for insert_parallel() went up significantly, which obviously makes no sense. I think the compiler is reordering instructions.
@helios, thank you for the feedback, I made sure everything compiled and ran on my machine before uploading any code so I'll take a look at everything you mentioned and try to see what I can do.

Question to your fourth comment, can you give me an example of what I can do with the data to get more a more accurate representation of the speed, I'm not really sure what you mean by this.
Last edited on
For example, to test std::set lookup you do this:
1
2
3
4
5
6
7
tm.start();
for(int i=0; i<checkData.size(); ++i){
    it = singularSet.begin();
    it = singularSet.find(checkData[i]);
    if(it == singularSet.end()){}
}
tm.end();
A sufficiently smart compiler could replace this with this:
1
2
tm.start();
tm.end();
or with this:
1
2
3
4
5
[code]
tm.start();
if (checkData.size())
    it = singularSet.find(checkData.back());
tm.end();

The best way to make sure your code doesn't get optimized away is to perform some operation with the results and send it to output. For example, I replaced that loop with this:
1
2
3
4
5
6
tm.start();
int r = 0;
for (auto &i : checkData)
    r += singularSet.find(i) != singularSet.end();
tm.end();
std::cout << r << std::end; 
Now the compiler can't optimize anything away. At worst it can reorder operations.
Last edited on
@helios,

1 & 7
I compiled and ran the codes in the benchmarks folder and two of my computers and they worked fine, the random number generator also worked how it was supposed to, generating different numbers between -1000 and 1000 each time I compile and run. I even tried it on an online c++ compiler, try running this code on c++ shell:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

#include <iostream>

typedef double dataType;

long long int numOfTestSubjects = 10;
dataType x1 = -1000.0, x2 = 1000.0, x;

int main()
{
  //generate random doubles for insertion

  const long max_rand = 1000000L;

  srandom(time(NULL));
  for(int i=0; i<numOfTestSubjects; ++i){
    x = x1 + ( x2 - x1) * (random() % max_rand) / max_rand;
    std::cout<<x <<std::endl;
  }

    return 0;
}


2.
Does your cpu only have one thread? numTrees is set when a constructor is called by getting the number of threads your cpu has then never changed again. I made this for cpus that have at least 4 threads because that's the smallest amount of threads you can get from a cpu on the retail market now. With that in mind all those places where I do numTrees-2 shouldn't be a problem unless you are running this on a machine with 1 thread, in that case there would no point in using this because it would act just like a normal std::set, I could add in a few conditional statements to make it work for cpus with 2 threads.

4.
Once I test to make sure the re balancing is working how it should I'll test the speed again by taking the sum of every element in the sets and printing them to the screen.

5.
I see what you're saying about the iterator and I didn't notice this when making the class because when I was testing the re balancing it was working as intended but I will take a look at the code you wrote.
Last edited on
1. srandom() and random() are not standard C functions. They're UNIX functions.
2.
With that in mind all those places where I do numTrees-2 shouldn't be a problem unless you are running this on a machine with 1 thread, in that case there would no point in using this because it would act just like a normal std::set
Your code should not invoke undefined behavior just based on the hardware configuration.
If you're not going to handle that case the least you should do is throw an exception in the constructor.
Topic archived. No new replies allowed.
Pages: 12