I created a class that acts just like a set but uses OpenMP to run in parallel to make everything faster, your thoughts?

Pages: 12
I created a class that does just about everything a set does in C++, such as insertion, deletion, and searching but I used OpenMP to make it run in parallel so that all of those actions are faster. I just wanted to share what I did here and get your thoughts and criticism on whether this would ever be useful to someone. This was for a research project at my universities research foundation.

The project is here on my github, it includes a documentation that explains what all of the functions do:
https://github.com/AvniOsmani/ParallelSet

The basic idea is that it uses a vector of sets where each cpu thread manages one tree, so a cpu with 4 threads will have a vector of size 4 containing one tree in each position.

I used what I called "breakpoints" to determine where to insert each element. When inserting an element each thread would check to see if the element belongs in its tree, if it does then it would go ahead and insert.

I added a re balancing function that re balances the trees by adjusting the breakpoints and moving the necessary elements to the correct trees after inserting so that it doesn't end up turning into a standard set if everything being inserted falls into the same tree.

Deleting and searching for elements works the same, where each thread will determine if it needs the element is in its tree then proceed with the deletion or search if necessary.

I also made it possible to insert, delete, or search for multiple elements at a time by passing a vector into the function rather than a single element.

After testing all of the functions on my PC (with 4 threads), inserting, deleting, and searching for elements was always about three times as fast because the threads had smaller trees to go through to complete the task.

I used a template so you should be able to use any data type that you can use with the standard set as long as it has the division (/) operator.

I have a document explaining what the functions are and a program that tests the speed against the standard C++ set by generating a million random doubles.
Last edited on
Thanks for sharing! Cool to see that you have a documentation PDF as well, though I would suggest also formatting that into the README.md itself.

I would suggest making the test folder be called something more clear than "final_parallel_set", perhaps "benchmarking".

Why are you #including "timer.cpp" instead of the standard header+implementation file method?
Last edited on
@Ganado, I updated the README.md, and I'll update the folder and rename it benchmarks once I get on my laptop. I included the timer.cpp because that's what I used to test the speed. It's a couple of functions my professor provided me with and it was the first way I learned about time in C++. I'll have to learn how to use the standard #include <time.h>.
My bad, I realize I was being ambiguous in that last sentence. I didn't mean anything related to time.h.

I meant that you shouldn't #include .cpp files, you should include header (.h/.hpp) files, when working with multiple files, and also be sure to use header guards with it.

for example,
timer.h
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
#ifndef MY_TIMER_H
#define MY_TIMER_H

#include<chrono>
typedef long int myint;
typedef double mydouble;

class Timer{
private:
  myint timerStart;
  myint timerEnd;
  myint timerInProgress;
public:
  Timer(const myint & =0);
  void start();
  void end();
  mydouble getTime();
};

#endif 


timer.cpp:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
#include "timer.h"

Timer::Timer(const myint & _inProgress){
  timerInProgress=_inProgress;
  if(timerInProgress==1){
    timerStart=std::chrono::high_resolution_clock::now().time_since_epoch().count();
  }
}
void Timer::start(){
  timerInProgress=1;
  timerStart=std::chrono::high_resolution_clock::now().time_since_epoch().count();
}
void Timer::end(){
  timerInProgress=2;
  timerEnd=std::chrono::high_resolution_clock::now().time_since_epoch().count();
}
mydouble Timer::getTime(){
  if(timerInProgress!=2){
    return -1.0;
  }
  mydouble fR= mydouble(timerEnd)-mydouble(timerStart);
  fR/= 1000000.0;
  return fR;
}


This makes it more extendable, in case you want to use it in multiple files.

This is not the important or interesting part of the project, just something I think is good to know :)
Last edited on
@Ganado, I understand what your saying now and I made the changes, a question about creating a header file for the class.

I used a template to create the class parallel_set, how do I create a header file for this because the way you do it for time.h doesn't work since it's a template. Should I just put all of the function declarations and definitions in the .h file or is there a better way for creating headers for classes that use templates?
Last edited on
No way to traverse the set?
@kbw, if you mean get rid of the template I'm not sure how I would do that without over complicating the code, it's suppose to mimic a std::set but run in parallel so it needs to accept different data types.
It should be a template, with the function declarations and definitions in the header file.

kbw is asking: how does one iterate through all the items in this set?
If I understand your question correctly I made the functions find_element() and find_parallel() that will return 1 if the element you're looking for is in the container and 0 if the element is not in the container. The functions themselves iterate through the appropriate tree to check if the element is in the container.
How would I, say print out, all the items that are in this set?
@JLBorges the function print_all_data() will print every item in the set.
Ok. What about:

In a set of integers, count the number of elements between 100 and 200?
https://en.cppreference.com/w/cpp/container/set/equal_range

Get the intersection of two sets?
https://en.cppreference.com/w/cpp/algorithm/set_intersection
What if I want to apply an unknown function to every item in the set?
I didn't implement that, but I don't think it would be too difficult to create because those functions already exist for std::sets and my class is just a vector of sets. I'm going to try and keep adding functions over time to try and make it as similar to std::set but faster.
Start by writing an iterator for the set and providing begin() and end()
1. What were you thinking when you wrote size_p()? All you had to do was
1
2
3
4
5
6
7
template <class T>
size_t set_parallel<T>::size_p() const{
  size_t ret = 0;
  for (auto &set : allData)
    ret += set.size();
  return ret;
}

2. There's no way this is faster than std::set. rebalance_trees() rebuilds all the sets every time a bulk insertion happens if size_p() returns a value larger than the thread count. If you're going to do this you may as well just build a sorted vector and sort it after every bulk insertion.

EDIT: 3. erase_parallel() is incorrect. It fails to remove elements contained in the set unless the threads are scheduled just right.
Last edited on
@helios

1. I'm not familiar with that yet, going to have to learn it.

2. Originally rebalance_trees() would only remove and add the elements that needed to be moved to another tree, but for some reason it made slower than a std::set, then I just tried this approach where the function empties everything into a vector then repopulates and it became faster than a std::set again.

Not sure why rebuilding the trees was faster than just moving the elements that needed to be moved.

size_p() returns the total number of elements in all trees not the number of trees, why would it be a problem to return a value larger than the thread count?
Last edited on
Also see my edit above.

size_p() returns the total number of elements in all trees, why would it be a problem to return a value larger than the thread count?
It's not a problem, but rebalance_trees() does nothing unless
 
if(size_p() > numThreads){

which on most machines would mean that you rebalance practically after every insertion.

I honestly don't see any way your code can be faster than std::set. Inserting into an std::set takes O(log n), and inserting into your set takes at least O(n). I'm at work right now, but I'll check out your benchmarks when I get home.
@helios, I was thinking I can make it re balance after a certain number of insertions happen, something like numthreads^(n) for some integer n. I would just have to keep testing it to see when would be the most optimal time to re balance.

erase_parallel(), every thread goes through every element that needs to be erased. Each thread will check to see if the element would be in its tree, if it would be then it tries to delete, if it wouldn't be in its tree then it just moves on to the next element.
Oh, right. I thought it was parallel for. My bad.
Pages: 12