[Speex-dev] How does the jitter buffer "catch up"?

Sun Sep 18 15:20:07 PDT 2005

> Thank you for a very good explanation which shed light on some of the
> questions that I had after reading the source code.
>
> Reading your text however, I wonder if I'm perhaps missing an important
> point on the proper use of the jitter buffer:
>
> ...
>> Now, clearly, if early_ratio is high and late_ratio is very
>> low, the buffer is buffering more than it needs to; it will
>> skip a frame to reduce latency.
> ...
>
> Question:
> Do I understand it that I should not put every incoming packet through the
> jitter buffer?
>
> The way my code works today is:
>
> 1) Packet read from socket
> 2) Call speex_jitter_put(...) with the just-arrived packet
> 3) Read one packet from jitter buffer using speex_jitter_get(...) function
> 4) Feed just read-from-jitter packet to the sound card for playback
>
> This will in fact feed one 20msec batch of sound to play at the sound card
> for every packet received from the speex encoder at the other end.
>
> I know I may sound a bit slow-on-the-pickup here, but at the risk of
> sounding very beginner like (which I'll gladly admit I am) I wonder if this
> is totally wrong to do?
>
> Question:
> Should the jitter buffer implementation not have a packet to return (data is
> simply missing) should I bother to feed the 20msec packet of silence
> (comfort noise perhaps?) to the speaker? Or should the jitter buffer perhaps
> hint me (with a return value?) that no packet was available and there is no
> need to feed anything to the sound card?
>
> In my current implementation, running on a Windows XP box, I have a growing
> number of outstanding packets queued to the soundcard. I believe this is
> happening because when packets are delayed (in my test case I have no packet
> loss, just delays) the jitter buffer interpolates and returns a packet to
> play. When the packets finally arrive, they too are queued to the soundcard.
> Resulting in an increasing non-recoverable delay of the speech coming out of
> the sound card.
>
> Your feedback is greatly appreciated. I thank you for taking the time to
> respond with any relevant details or hints.

If you call speex_jitter_put and get from the same thread on every packet, 
you are getting no benefit from the jitter buffer at all.

Correct usage of the jitter buffer is as follows:

Create two threads (or two evens inside the same thread):

Thread/Event #1: On every packet received, immediately and without delay 
call speex_jitter_put.

Thread/Event #2: Every 20ms (use a damn-high-precision timer or 
play-notification from your soundcard), call speex_jitter_get and play 
that frame.

Meaning, even if no packet arrived in the last 20 ms, you should 
still call speex_jitter_get, and no matter if 100 packets arrived in the 
last 20 ms, you should still only call speex_jitter_get once.

(PS, if you do use threads, protect speex_jitter_put/get with a mutex 
(CRITICAL_SECTION I believe they're called in Win32Speak) -- calling put 
and get at the exact same time from different threads leads to "features")

This allows the jitter buffer to actually buffer, and try to keep the 
delay between frame arrivel and frame play to a minimum.

As for detecting outages, a hack I use is to check jitter->valid_bits. If 
it's set, we decoded "something", if it's not, we're interpolating 
something which may not sound that good so feed the soundcard 20ms of 
silence instead.

Incidentally, on Win32 you probably want to use DirectSound with a looping 
buffer of around 200ms and play notifiers; try to be at least one frame 
ahread of the write cursor -- the notification will come when the play 
cursor is AT the position, meaning that you'd need to decode the packet in 
no time (and I do mean "no time") to avoid artifacts in the sound. With 
one frame to go on you should be safe.