I am running serial jobs in a compute nodes with 24 core (48 thread). When I run only one job in the node, it gives me the maximum performance. But once I increase the number of jobs (serial jobs) in the same node, the speed reduces almost linearly. I have 64GB RAM memory available in the node and maximum I will use upto 1GB of memory only. I was wondering whether false sharing have something to do with this. I know that, in OpenMP parallel codes, false sharing can cause this type of problems. But for serial jobs, will false sharing occur? If so, what can we do for that. What type of modifications should we do in the code if it happens.
False sharing should only be triggered if the data sets operated on by each thread are interleaved. For example, imagine a graphical bitmap being operated on by two threads. False sharing might be triggered if one thread operates on the even pixels and the other on the odd pixels, but it could be avoided if one operates on the top half of the bitmap and the other on the bottom half.
Is it possible your jobs are IO-bound, rather than CPU-bound? I can't really think of anything else that could cause the performance to degrade as parallelism increases.
I do not do any input/output operations. Actually, I removed the I/O part in order to check the cpu performance.
It is also interesting that, this problem occurs only in our new cluster (Intel Xeon E5-2650 v4) and not in our old cluster (Intel Xeon E5620).
Linear time decrease for running more copies tells me that it may be trying to run them all on the same cpu, no matter how many you have. Have you verified that it is actually correctly assigning jobs to cores?