I created a class that acts just like a

@Ganado, I updated the README.md, and I'll update the folder and rename it benchmarks once I get on my laptop. I included the timer.cpp because that's what I used to test the speed. It's a couple of functions my professor provided me with and it was the first way I learned about time in C++. I'll have to learn how to use the standard #include <time.h>.

Ganado (6783)

My bad, I realize I was being ambiguous in that last sentence. I didn't mean anything related to time.h.

I meant that you shouldn't #include .cpp files, you should include header (.h/.hpp) files, when working with multiple files, and also be sure to use header guards with it.

for example,
timer.h

#ifndef MY_TIMER_H
#define MY_TIMER_H

#include<chrono>
typedef long int myint;
typedef double mydouble;

class Timer{
private:
  myint timerStart;
  myint timerEnd;
  myint timerInProgress;
public:
  Timer(const myint & =0);
  void start();
  void end();
  mydouble getTime();
};

#endif

timer.cpp:

#include "timer.h"

Timer::Timer(const myint & _inProgress){
  timerInProgress=_inProgress;
  if(timerInProgress==1){
    timerStart=std::chrono::high_resolution_clock::now().time_since_epoch().count();
  }
}
void Timer::start(){
  timerInProgress=1;
  timerStart=std::chrono::high_resolution_clock::now().time_since_epoch().count();
}
void Timer::end(){
  timerInProgress=2;
  timerEnd=std::chrono::high_resolution_clock::now().time_since_epoch().count();
}
mydouble Timer::getTime(){
  if(timerInProgress!=2){
    return -1.0;
  }
  mydouble fR= mydouble(timerEnd)-mydouble(timerStart);
  fR/= 1000000.0;
  return fR;
}

This makes it more extendable, in case you want to use it in multiple files.

This is not the important or interesting part of the project, just something I think is good to know :)

Last edited on

@Ganado, I understand what your saying now and I made the changes, a question about creating a header file for the class.

I used a template to create the class parallel_set, how do I create a header file for this because the way you do it for time.h doesn't work since it's a template. Should I just put all of the function declarations and definitions in the .h file or is there a better way for creating headers for classes that use templates?

Last edited on

kbw (9488)

No way to traverse the set?

@kbw, if you mean get rid of the template I'm not sure how I would do that without over complicating the code, it's suppose to mimic a std::set but run in parallel so it needs to accept different data types.

It should be a template, with the function declarations and definitions in the header file.

kbw is asking: how does one iterate through all the items in this set?

If I understand your question correctly I made the functions find_element() and find_parallel() that will return 1 if the element you're looking for is in the container and 0 if the element is not in the container. The functions themselves iterate through the appropriate tree to check if the element is in the container.

How would I, say print out, all the items that are in this set?

@JLBorges the function print_all_data() will print every item in the set.

Ok. What about:

In a set of integers, count the number of elements between 100 and 200?
https://en.cppreference.com/w/cpp/container/set/equal_range

Get the intersection of two sets?
https://en.cppreference.com/w/cpp/algorithm/set_intersection

What if I want to apply an unknown function to every item in the set?

I didn't implement that, but I don't think it would be too difficult to create because those functions already exist for std::sets and my class is just a vector of sets. I'm going to try and keep adding functions over time to try and make it as similar to std::set but faster.

Start by writing an iterator for the set and providing begin() and end()

1. What were you thinking when you wrote size_p()? All you had to do was

template <class T>
size_t set_parallel<T>::size_p() const{
  size_t ret = 0;
  for (auto &set : allData)
    ret += set.size();
  return ret;
}

2. There's no way this is faster than std::set. rebalance_trees() rebuilds all the sets every time a bulk insertion happens if size_p() returns a value larger than the thread count. If you're going to do this you may as well just build a sorted vector and sort it after every bulk insertion.

EDIT: 3. erase_parallel() is incorrect. It fails to remove elements contained in the set unless the threads are scheduled just right.

Last edited on

@helios

1. I'm not familiar with that yet, going to have to learn it.

2. Originally rebalance_trees() would only remove and add the elements that needed to be moved to another tree, but for some reason it made slower than a std::set, then I just tried this approach where the function empties everything into a vector then repopulates and it became faster than a std::set again.

Not sure why rebuilding the trees was faster than just moving the elements that needed to be moved.

size_p() returns the total number of elements in all trees not the number of trees, why would it be a problem to return a value larger than the thread count?

Last edited on

Also see my edit above.

size_p() returns the total number of elements in all trees, why would it be a problem to return a value larger than the thread count?

It's not a problem, but rebalance_trees() does nothing unless

if(size_p() > numThreads){

which on most machines would mean that you rebalance practically after every insertion.

I honestly don't see any way your code can be faster than std::set. Inserting into an std::set takes O(log n), and inserting into your set takes at least O(n). I'm at work right now, but I'll check out your benchmarks when I get home.

@helios, I was thinking I can make it re balance after a certain number of insertions happen, something like numthreads^(n) for some integer n. I would just have to keep testing it to see when would be the most optimal time to re balance.

erase_parallel(), every thread goes through every element that needs to be erased. Each thread will check to see if the element would be in its tree, if it would be then it tries to delete, if it wouldn't be in its tree then it just moves on to the next element.