waveOut API latency

Hello,

This is a waveOut API intensive question. Not sure if I should stick this in windows programming...
I'm using the windows waveOut API to procedurally generate and stream PCM audio samples using a double-buffering(block) scheme.
My issue is that I can only fill the buffers(blocks) with new data when a buffer finishes playing. This is a problem, because apparently my buffer-filling callback is slow enough to cause noticeable gaps between the blocks - which is totally counterproductive to the double/poly-buffering scheme.

To help visualize my problem:

1.)Initially, Fill buffer0 and buffer1 with data

[buffer0][buffer1]


2.)write both buffers to the device - they get added to a queue - buffer0 starts playing

3.)buffer0 finishes playing

[buffer0][buffer1]
----------^


so we gotta fill buffer0 with new data

4.) call slow, expensive callback function to fill buffer0 - cause gap

I wonder if anyone out there has experience tackling this particular problem, or any advice or solutions that can put me on the right track. My first instinct was to look into threads - any ideas?
Last edited on
IIRC with waveOut you can supply any number of buffers, not just 2. A larger number of smaller buffers will help you get the audio callback to occur in smaller chunks which might help with underrun.

So if you want to shoot for 100 ms latency... then instead of 2 buffers of 50ms... try 10 buffers of 10ms.

That said... I've never been able to get "good" latency with waveOut. I worked with it way back when -- when I was working on an NSF player (this was years and years ago). I think the lowest I ever got it without drops was 100-150 ms. I don't know if waveOut is really capable of high performance audio.

Another thing you can try to speed up your callback is to pre-prepare the audio and simply copy it over. You could do this by creating another [probably circular] buffer which you load the audio into. So your code would feed the circular buffer... and the circular buffer would feed your waveOut buffers.

Though the more I think about that the less helpful I think it would be. All it would do is create more work and add additional latency. I've done it in the past but for a different reason I won't get into now.



So yeah breaking up into smaller buffers is really my only suggestion. If that doesn't work you might just have to go with a higher performance audio lib. DirectSound is an option if you don't mind clunky DirectX calls (although I wrote a simple wrapper to do this a while back... I don't mind sharing if you're interested -- just lemme know). Other options include BASS, FMod, or I think even SFML.
Hey Disch,

Thanks for the feedback. I really appreciate it.
First of all, I'm sorry - I used some misleading words like "latency" in my original post.

As far as I know, "latency" - in the case of streaming audio (like in a Digital Audio Workstation) is the actual interval of time it takes for a signal to get processed and to reach your ears - basically. To be more clear, what I meant to say was that the transition between blocks isn't seamless. How "late" the signal is doesn't really matter to me at this point.

Your "pre-prepare audio" suggestion also wouldn't work, because the user will be controlling some parameters of the audio as it's being generated, so I need things to happen "on the go".
Of course, latency can be (and usually is) an issue with interactive applications like this one.

I designed this project with more than two buffers in mind anyways, so it's very easy for me to add buffers.
I have actually tried FMOD and BASS. I had some issues with FMOD. I liked BASS, but I felt like I wanted more control over what was going on, and that I could slap together an even more lightweight way of streaming audio.

Additionally, I would be interested in seeing your DirectSound wrapper just anyways, if it's not too much trouble.

*EDIT* Just remembered that you shared a DirectSound wrapper on this thread, is it the same one? http://www.cplusplus.com/forum/general/109119/2/
Last edited on
To be more clear, what I meant to say was that the transition between blocks isn't seamless. How "late" the signal is doesn't really matter to me at this point.


Oh, okay.

waveOut blocks should be seamless. If they weren't, they wouldn't be very useful.

If you're experiencing breaks in the audio... you might not be filling the buffers enough... or you might be reporting an incorrect buffer size (ie: telling waveOutWrite that you have 2000 bytes when you really only have 1950 bytes in the buffer).

Other than that... the only other thing I can think of that might be causing this is if your audio processing is too slow to happen in realtime (ie: it takes you longer than 200 ms to produce 200 ms of audio)... in which case there is nothing you can do short of buffering large chunks of the audio before you start streaming.



If you can reproduce the problem you're seeing in a small program I wouldn't mind taking a look at it.

*EDIT* Just remembered that you shared a DirectSound wrapper on this thread, is it the same one?


Yes, that is the same out. soundout.h and .cpp. I wrote those like 8-10 years ago or something. I'm amazed at how much mileage I've got out of them.

The thing to note is that it doesn't operate on a callback mechanism. You have to constantly poll the output (with CanWrite) to see how much audio can be written to the buffer (the value CanWrite returns increases over time).




so I need things to happen "on the go".
Of course, latency can be (and usually is) an issue with interactive applications like this one.


So... you do need low latency? ;P.

As you mentioned... latency is the delay between when the audio is generated and when it's actually heard. Example: If the user can flip switches to change the generated audio... and if you are outputting at 2 seconds latency... they'll press a button and won't hear the effect until 2 seconds later.


99% of latency comes from the buffer size. Bigger buffer = higher latency but less chance of underrun.
So... you do need low latency? ;P.

I only said that for completeness I guess. I'm aware that interactive applications like this typically require fast "response times", but I'm not concerned with it at the moment. Sorry for the confusion.

As I mentioned in my original post, I suspect that my callback (the function I use to fill the buffers with data) is what's causing the gap between buffers(blocks), because it's so slow. The callback gets called when a buffer has finished playing, so apparently the slow callback needs to finish before the waveOut device can play the next buffer, causing the gap. I don't know how exactly the waveOut device is bound to my application's thread once I've sent it data.

Furthermore, I guess my callback is slow because the buffer size is relatively large compared to your standard buffer size.

I will edit this post soon and provide a dropbox link. I'll try to strip it down to bare bones. I'd really appreciate your input once it's up.


*EDIT* here it is:
https://www.dropbox.com/s/czdcjedquhulpmg/waveOut%20stream.zip

I tried to document it without being verbose. Please read the short readme.txt first.
If there's anything else I forgot to clarify, please let me know.
Last edited on
Try upping the buffer count first thing. You might need 3 buffers for it to work... maybe waveout needs to have one queued while one is played, while you're filling the 3rd.
Try upping the buffer count first thing. You might need 3 buffers for it to work... maybe waveout needs to have one queued while one is played, while you're filling the 3rd.


The issue is that there's no place for me to invoke the buffer filling method other than right when a buffer finishes playing.
That's when you should fill it. But if you have more buffers, then there'll be more time for you to get them filled.

Like... example...

Assuming you have 100ms latency.

with 2x50ms buffers... one buffer will be playing, which means you have 50 ms to fill the other buffer.

Whereas with 10x10ms buffers, 1 buffer is playing, 8 are queued, and so you have 90 ms to fill the last buffer.
Ugh, I'm stupid. It wasn't my callback that was being slow. It was std::cout and std::endl.
I can't believe it, haha! It had nothing to do with my implementation.
I don't know what to say. I guess that's a lesson in not over-complicating things?
Anyways, it works fine now. Thanks for sticking with me through my troubled times!
Also, it makes sense what you're saying about the buffers. I understand, and I will keep what you said in mind when I go from two buffers to three or more.
Last edited on
It was std::cout and std::endl.


Hah! Yeah that'll do it. Glad it's working now. =)
Last edited on
Topic archived. No new replies allowed.