Does a set in C++ run in parallel, if no

Forum

Forum
Beginners
Does a set in C++ run in parallel, if no

Does a set in C++ run in parallel, if not does C++ contain a set that takes advantage of OpenMP?

The standard container in C++ called a "set," I don't think it uses multi-threaded programming because it only uses one binary tree correct?

Is there a container in C++ similar to a set but is programmed using OpenMP so that it would operate faster than the standard set?

helios (17505)

You can run queries on std::set (std::set::find()) in parallel, but additions and removals must be serialized. To my knowledge any set data structure will have the same constraints, or it will have other limitations, such as occasionally returning the wrong answer.

Are you having performance problems? Although not the fastest data structure, std::set is reasonably quick for most scenarios. If you're having performance problems you should make sure you're not doing something silly.

Duthomhas (13118)

I know next to nothing about OpenMP, but I do not believe so.

The reason is fairly mundane, actually: a set has a single main purpose:

• verify that an element is a member of the set

This cannot be done in any multithreaded way without synchronizing multiple threads, and the effort to do that typically outweighs what it takes to perform a lookup.

If you are dealing with literally billions of elements, then you are dealing with specialized memory handling already, and separating such a giant set into multiple objects is natural — at which point using multiple threads/processors/servers to check membership is not anything odd.

tl;dr
There is no reason that std::set should need to handle multiple threads.

Addendum
You can do it, actually, by playing with the set’s allocator. Whether this is worth the effort is not something I can answer.

jonnin (11330)

this may be me not knowing what you want to do, but the KISS answer could be to shove your data into TWO (or more) sets and group them together in a class that does the MP stuff by using the built-in tools in parallel on each set and combining the result at the class level.

Niccolo (720)

The correct choice of any data structure requires rather thorough knowledge of the actions required.

For example, any binary tree approach (typical of set) is chosen specifically for quick, frequent insertions and deletions. This is opposed to a sorted vector (or array), because insertion and removal of sorted information is not efficient, but searching is usually much faster, making it more suited in situations where searching is frequent but insertion and deletion almost never happens.

So, with your limited description, no one is in a good position to advise beyond what can be done for a binary tree structure.

Since the tree structure is balanced, the balancing algorithms quash the hope of threading insertion and deletion (thus, no parallel operation).

If you can be more specific about your application usage pattern, there may be more to offer.

JLBorges (13770)

Microsoft PPL has concurrent_unordered_set
https://docs.microsoft.com/en-us/cpp/parallel/concrt/parallel-containers-and-objects?view=vs-2019#unordered_set

Haven't used the library; a quick check does appear to provide some performance improvements.

#include <unordered_set>
#include <ppl.h>
#include <concurrent_unordered_set.h>
#include <chrono>
#include <iostream>

int main()
{
    static const int N = 10'000'000 ;
    static const int M = 4'000'000 ;
    
	{
		std::unordered_set<int> set;

		const auto start = std::chrono::steady_clock::now();
		for( int i = 0 ; i < N ; ++i ) set.insert( i%M ) ;
		const auto end = std::chrono::steady_clock::now();

		using namespace std::chrono;
		std::cout << "sequential: " << duration_cast<milliseconds>(end-start).count() << " millisecs\n";
	}

	{
		concurrency::concurrent_unordered_set<int> set;

		const auto start = std::chrono::steady_clock::now();
		concurrency::parallel_for(0, N, [&set]( int i ) { set.insert( i%M ); }, concurrency::static_partitioner{} );
		const auto end = std::chrono::steady_clock::now();

		using namespace std::chrono;
		std::cout << "  parallel: " << duration_cast<milliseconds>(end-start).count() << " millisecs\n";
	}
}

sequential: 4502 millisecs
  parallel: 2659 millisecs

https://rextester.com/IOFR8338

jonnin (11330)

adding to what Niccolo said, you can also double down. Ive kept a tree and a vector of the same data, ye olde time-space tradeoff, wasting space to save time.

Duthomhas (13118)

sadraint wrote:
I don't agree with you

With whom? About what?

Topic archived. No new replies allowed.