Thread Chaining & Latency

Hi All,

I was working on a problem that was best solved with chaining threads together using a blocking thread-safe queue. Eg threads A and B pass data to each other using a queue ... a simple producer consumer design A -> B. This has two benefits: 1) being that there is a buffer between the two threads to cache for a slow consumer and 2) allows for better throughput when loads of data are high because the two can run in parallel. Most of the time the design works well under load. but I found when pushing data through the threads intermittently, i.e. once about 0.5 seconds it ran poorly, mostly from what seemed to be latency introduced during the OS waking up of the consumer thread B.

What i have come up with to solve this issue is what i call an .... Its a wrapper around a basic thread safe queue and adds one extra function called .... The idea here is that if you have a simple case (or perhaps a slightly more complicated case) of two threads in a prod cons design and you know the producer will create or received the data, process it, and then push onto the queue, why not give the consumer thread a heads up to let it know the data is about to arrive. In this case call .... In this way you can have the consumer thread in a polling state anticipating the arrival of data and avoid the time to schedule it back to a running state.



Anyway, i have some code below and I was looking to get some feedback. I have tested it somewhat and it does improve the responsiveness quite substantially.

I also have some timings too which im happy to share, but I would love any feedback if you have any thoughts.

Last edited on
once about 0.5 seconds it ran poorly, mostly from what seemed to be latency introduced during the OS waking up of the consumer thread B.
That sounds suspiciously like something I ran into under Linux.

What OS are you using? If you're using Linux, try to retest under FreeBSD before putting too much effort in.
In other words, you're switching from a condvar wait to a spin lock when you know the data is coming, That makes sense, and you've saying it works for you, so it's good.

if you're concerned with thread scheduling times that much, I would recommend switching to a real time OS or at least turning on real time scheduling (if you're using Linux, the rt patch from kernel.org does wonders for latencies). There are of course more deterministic ways to write a spin lock, but scheduler first.

What OS are you using? If you're using Linux, try to retest under FreeBSD before putting too much effort in.


kbw - Thats interesting. I'll give this a try. I'm using Ubuntu linux.
Though i don't believe changing OS is an option for a lot of people in a production environment.

There are of course more deterministic ways to write a spin lock, but scheduler first.

Cubbi - you also raise a good point. I don't make any attempt to deliver a first in first out service on those who are waiting data in pop().
I'll have a look at the scheduling too.

As a brief example of the times i see, with a 100000ns sleep between data, the latency across the queue without calling excite is 103ns on average. Calling excite its 22ns.
Cleaned up some of my test code to give better output on the timings.
Wait time - is obviously the time between sending data through the queue.
Work iterations - is the number of loops i do adding some number to a counter to simulate work.
Iterations - the number of cycles for the test to get an average. On tests with a low wait value i ran more to get a better average.

I also wonder weather using 'pause' has a positive effect on the cache retention in the consumer side?

Last edited on
You are locking the whole container just for appending a single item to the queue? Looks like a waste. IMHO a thread-safe lockless queue would perform much better than that.
You are locking the whole container just for appending a single item to the queue? Looks like a waste. IMHO a thread-safe lockless queue would perform much better than that.


rapidcoder - Yes i do agree with you, however im aiming to solve the problem of the schedulers latency. A lamport queue or any of the concurrent queue implementations would probably work just as well here.
Here are some results from a freeBSD 9.0 run. Different specs But Looks very much the same as the Ubuntu comparison.

Last edited on
Topic archived. No new replies allowed.