I am trying to figure out how to use pthreads for some of my work. The usage seems fairly clear from the online tutorials I have had a look at. However, I have a more basic question.
As I understand it, the user simply creates the threads and the system attempts to execute them in parallel. There is nothing that stops me from creating many more threads than the number of processors available.
Say, I have a 4 processor system and want to execute the same function func() on the 4 different processors. The two ways i can do this is create 4 threads using pthread_create() and use pthread_join() in main() to wait for these to complete execution. The other option is use pthread_create() to call the function func() 3 times, call func() 4th time in main() and then use a pthread_join() to wait for the 3 threads.
While option is better ? My concern is that main() itself is a thread which might be put to better use rather than just creating and waiting for other threads to finish execution.
There is no guarantee that a given thread will be scheduled on a given processor. The OS decides what's best and deals with SMP/NUMA issues, resourcing all the other applications/drivers already running, memory issues and so on.
The first of your methods is preferred. Don't think about multithreading as running something on a processor. Instead, try to think of solving problems in a parallel way. If you focus on the application domain, while just keeping an eye on the physical, you'll be better off. Parallel programming is hard, although threads and so on are easy enough to understand on their own, so don't make it harder for yourself.
Keep in mind that the CPU count merely says how much actual work you'll be able to do at the same time. Threads in wait state do no work. The OS allocates no CPU time for them until they stop waiting. Thus, you can create as many threads as you want and have them waiting and it'll cost you nothing in terms of computational power (this is the basic principle behind thread pools).
The only extra cost of having the main thread waiting is that you'll have to create one extra thread, which means allocating one more stack; about 1-2 MiB of memory. It's not outrageous, so really just use whichever is easier.
Thanks for the reply. For the moment, I have a two processors system. I was trying to see if I can achieve any speedups for some very simple programs from a tutorial. For 2 threads I could get a 1.5 times speedup. This got me wondering if some of the system resources are being wasted. Perhaps not.
Sometimes, how long the function runs for can be a limiting factor. If just creating the thread takes a significant amount of work compared to the function, you'll see you don't gain as much from parallelizing. Add to this that the threads won't run exactly parallel. For example, thread 1 may start doing useful work only once thread 0 has already done half its workload:
Original workload |--------------------------------------------------------------|
Just to follow up on this discussion, how many threads do you recommend I create for a given application. Assume that teh size of the problem is sufficiently large and the threads do not need to communicate with each other.
I have a large array of data structures and I am using threads to perform identical operations on different portions of the array. I am interested in benchmarking the achievable speedups. I am hoping to understand what speedups are achievable before I go ahead and create a real application.
Now, I have a 4 processor system. With 4 threads I get an almost 3.5 times speedup which is not bad. But the code with 8 threads is a little faster than with 4 threads on the same system. this is something i did not expect. Note that the threads just do number crunching and do not require any memory of its own or any communication with each other.
Is there a rule of thumb in the number of threads that one should create ?
No more than one for every core in the system.
Note that some CPUs have something called "hyper-threading", which can make a single core run two threads concurrently. The physical core is then said to have two logical cores. It's up to you whether to count logical cores as physical for the purposes of creating threads.
To elaborate: think 8 hands and 1 brain. You may be able to work on 8 thing at one time (asynchronously), but it might be just as fast to work on then one at a time instead because your attention is divided. As for multi(quad)-cores, think 4 people each with 2 hands (or something like that...)
For thread synchronization I will use a combination of pthread_cond_wait() and pthread_cond_signal() protected by mutex variables. Now, as I understand it, there is no guarantee that the thread calling pthread_cond_wait() will get called before pthread_cond_signal(). So, in case pthread_cond_signal() ends up being called before pthread_cond_wait, that signal will be missed. Is this correct ?
If yes, then clearly it is crucial to get a feedback from the thread which calls on pthread_cond_wait() if it has received the signal before the thread calling pthread_cond_signal() can proceed. Is there any any standard way to do this ?
Yes, I think i see what you mean. So even if the thread calling pthread_cond_signal() finishes before the thread waiting for the signal, the second thread having checked that certain condition is satisfied can simply proceed without waiting. Also, i meant the following format when i say protected by mutexes
1 2 3 4
exactly: given blocks A and B above, only two things can happen:
1. A goes first and upon hitting pthread_cond_wait() and unlocks the mutex, but gets blocked waiting. B then locks the mutex and signals the condition change. A gets unblocked and tests the condition with the while (!condition).
2. B goes first, locks the mutex, and calls changedCondition() and pthread_cond_signal() before A comes up, thus losing the signal - but the mutex is unlocked. A then goes, but checks the condition first, so doesn't even need to wait in this case!
The most important idea is that the condition variable actually signals a possible change in the condition state. It's up to the other thread to do the checking.
Here is another question. Is it required to protect function calls also by mutexes like variables ?
Say, I have created N threads with the function handle void *func(void *). Now if func() accesses another function say commonfunc(). Should the call to commonfunc() in func() also be protected by mutexes ?
Edit: I just did some tests and i don't think mutexes are needed for protecting function calls.