High performance parallel for as possible as standard

Dear experts,

This is so basic question, so that I would appreciate it if you could answer to me.

I am implementing parallel procedures, particularly so called as "parallel for", in C++. I know there are nice libraries such as openMP, TBB, but I currently hope I implement it based on standard library.
I tried implementing such a processing by using std::thread like the below as you know,

/////////////////////////////////////////////////////////////////////////
std::vector<std::thread> workers;
for(int id=0; id < thread_num; id++){
workers.emplace_back([&](auto id){/*some parallel processing*/}, id);
}
for(auto& w : workers)
w.join();
/////////////////////////////////////////////////////////////////////////

However, it takes too much time and bad Parallel efficiency probably due to overhead of construction in std::thread objects.
(I understand there is no thread pool. And some information tells me std::async is high performance. But I could not be sure)

Could you introduce to me technique of high performance parallelism as possible as standard (boost or public-license library are welcome).

Thank you in advance,
Best regards

Mitsuru


If you use VS 2017 it might be worth to look at http://www.cplusplus.com/forum/lounge/245773/
> I understand there is no thread pool. And some information tells me std::async is high performance.

Yes. std::async may be a better option for this.
Some std::async implementations (eg. Microsoft) may exploit kernel support for thread pools,
others (eg. libc++) may simulate thread pools in user space.

For example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
#include <iostream>
#include <string>
#include <future>
#include <mutex>
#include <algorithm>
#include <vector>
#include <numeric>
#include <atomic>

template < typename RA_ITERATOR, typename FN >
void parallel_for( RA_ITERATOR begin, RA_ITERATOR end, FN fn, std::size_t seg_size = 1024*128 )
{
    const std::size_t n = end - begin ;

    #ifndef NDEBUG
    {
       static std::mutex lock ;
       std::lock_guard<std::mutex> guard(lock) ;
       std::cout << "parallel_for: n == " << n << " thread: " << std::this_thread::get_id() << '\n' ;
    }
    #endif // NDEBUG

    if( n < seg_size ) std::for_each( begin, end, fn ) ;

    else
    {
        auto future = std::async( std::launch::async, parallel_for<RA_ITERATOR,FN>, begin, begin+n/2, fn, seg_size ) ;
        parallel_for( begin+n/2, end, fn, seg_size ) ;
        future.wait() ;
    }
}

int main ()
{
    const long long N = 1'000'000 ;

    std::vector<int> vec(N) ;
    std::iota( vec.begin(), vec.end(), 1 ) ;

    std::atomic<long long> sum{0} ;
    parallel_for( vec.begin(), vec.end(), [&sum]( int& v ) { v += 2 ; sum += v ; } ) ;

    std::cout << sum << '\n'
              << N * (N+1)/2 + N*2 << '\n' ; // should be equal to sum
}

http://coliru.stacked-crooked.com/a/b99186f9f656dc74
https://rextester.com/OBQBS74113
Dear Thomas1965 and JLBorges,

I sincerely appreciate your kind replies.

Now I am using cross platform (Win and Linux Ubuntu) by gnu build tools and cmake build systems.
I cannot use platform-dependent library, so that I will try std::async and C++17 schemes at first stage.

By the way, I understand conceptual concurrency and parallel methodology to some extent (e.g. The Art of Concurrency).
But I do not know the-state-of-the-art comprehensive documents for implementation techniques in C++ (particularly C++17).
If you know the documents, I would appreciate it if you could tell me.

(For high performance computing, some implementation details should be considered such as thread pool , memory access, synchronization etc. I would like to grasp them).

Thanks a lot,
Best regards
'C++ Concurrency in Action, 2E' by Anthony Williams
https://www.manning.com/books/c-plus-plus-concurrency-in-action-second-edition?a_aid=anthonywilliams&a_bid=42212b7b
Under MEAP right now


'C++ High Performance: Boost and optimize the performance of your C++17 code' by Viktor Sehr
https://www.amazon.co.uk/dp/1787120953
Chapters on concurrency, writing parallel algorithms and Boost.Compute
Dear JLBorges,

Thank you for your reply.

They seems so practical for me, and I appreciate your introduction.

Best regards,
Topic archived. No new replies allowed.