So I've been using
std::thread
a little recently on my laptop with its dual-core processor.
The following program is supposed to do the boring job of incrementing a number continuously and then printing the total. Different memory is given to each thread to avoid data races.
It takes one argument, the number of threads to try and create.
EDIT: New code in more recent post.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
|
#include <iostream>
#include <thread>
#include <vector>
#include <cstdio>
#include <ctime>
const long long AMOUNT_TO_INCREMENT_BY = 5000000000LL;
void NumberIterator(long long *number, long long count)
{
count += *number;
while (*number < count) ++(*number);
}
int main(int argc, char **argv)
{
if (argc != 2) {
std::cerr << "WRONG NUMBER OF ARGUMENTS! Must have a single integer argument.\n";
return 0;
}
int numberOfThreads;
std::sscanf(argv[1], "%d", &numberOfThreads);
if (numberOfThreads <= 0) {
std::cerr << "WRONG ARGUMENT! Must have a single positive integer argument.\n";
return 0;
}
long long
number = 0,
amountToIncrementByPerThread = AMOUNT_TO_INCREMENT_BY / numberOfThreads;
int amountLeftOver = static_cast<int> (AMOUNT_TO_INCREMENT_BY % numberOfThreads);
// Allocate different regions for each thread to work in.
std::vector<long long> data(numberOfThreads, 0);
std::clock_t timeVal = std::clock();
// I construct numberOfThreads threads, each with their allocated parameters.
std::vector<std::thread> myThreads;
myThreads.reserve(numberOfThreads);
for (int i = 0; i < numberOfThreads; ++i) {
myThreads.push_back(std::thread(NumberIterator, &data[i], amountToIncrementByPerThread));
}
// Here I wait for the threads to complete their main functions.
for (auto &i: myThreads) {
i.join();
}
// And I finish-up by adding the remaining numbers.
for (auto &i: data) {
number += i;
}
NumberIterator(&number, amountLeftOver);
timeVal = std::clock() - timeVal;
std::cout << "Total should be " << AMOUNT_TO_INCREMENT_BY << ", is " << number << ".\n";
std::cout << "Completed in " << timeVal * 1000 / CLOCKS_PER_SEC << "ms.\n";
return 0;
}
|
Performance is decent on Windows. Using
CL <filename> /EHsc /Za
it completes with the one thread in about 16s, and with 4 threads it takes about 10s (which is roughly what I'm after, as my processor has 2 cores).
Performance is undesirable on Linux. Using
g++ <filename>.cpp -std=c++11 -o<filename> -pthread
with 1 thread we get about 15.5s (well done, g++) and with 4 threads we get about 37s. Errr, what? I was really expecting something even close to the original time, yet alone well over twice as slow.
So, am I completely misinterpreting how to use
std::thread
, or is this the kind of situation that concurrent processing really isn't optimised for?
Any input on my errors or the curiosities of threads appreciated.