I have a pipeline of sources, sinks, and filters that looks like
source -> filter -> ... -> filter -> sink
The sources and sinks are usually files, but they might be in-memory streams, or null devices or some other thing. The filters are generally CPU-intensive, so I want them to run in parallel.
I spent the last couple days working on an M:N round-robin fiber scheduler, so that M filters can run on N threads, where N = the number of CPU cores. When the input queue of a filter is exhausted or its output queue is full, the fiber can yield so the thread can be used to perform some other task.
The problem is that now I'm second guessing myself. Is there any real advantage to redoing the work the kernel does already? Would it make more sense to just run each filter in its own thread, even if this means running more the one thread per core, and let the kernel figure it out?
Currently, the only advantage I can think of is that now the coordinator thread can call pool.sync() to wait for the pipeline to stall (which would happen when some copy operation reaches an EOF, for example), and at that point all the fibers that are still running will be in a well-defined suspended state, which allows any thread to pretty much arbitrarily modify its data structures. Meanwhile, a thread would never really be suspended, it would just be in a
1 2
|
while (!this->ready())
this->cv.wait();
|
loop.