[Speex-dev] How does the jitter buffer "catch up"?

Thu Sep 22 15:50:10 PDT 2005

Hello, 

The way you describe how the jitter buffer should be implemented makes me wonder: How does the jitter buffer works when there is no transmission?
Let's say my "output" thread gets a speex frame from the jitter buffer every 20ms. What happen when there is no frame that arrived on the socket? No frames at all for a pretty long time (ie many seconds).
This is my case because I chose not to transmit any sound data when speech was not recognized (This speech probability from the preprocessor is so sweet! Thanks Jean-marc!). Yes, I know, I'm cheap on bandwidth, but that's on purpose... :(

I read Munble source code (v0.3.2) to see how you do. And I found this comment:
	// Ideally, we'd like to go DTX (discontinous transmission)
	// if we didn't detect speech. Unfortunately, the jitter
	// buffer on the receiving end doesn't cope with that
	// very well.

I did not implemented the jitter buffer yet, but I wonder if I should?
I was thinking about holding the first few sound frames before playing them. That way, I introduce a delay, which should remove the jitter. Moreover, since I'm not transmitting when not speaking, the delay does not sum up to get pretty long in the end.

Thanks in advance.

	Gabriel

-----Original Message-----
From: speex-dev-bounces at xiph.org [mailto:speex-dev-bounces at xiph.org]On Behalf Of Thorvald Natvig
Sent: Sunday, September 18, 2005 4:20 PM
To: Baldvin Hansson
Cc: speex-dev at xiph.org
Subject: RE: [Speex-dev] How does the jitter buffer "catch up"?

> Thank you for a very good explanation which shed light on some of the
> questions that I had after reading the source code.
>
> Reading your text however, I wonder if I'm perhaps missing an important
> point on the proper use of the jitter buffer:
>
> ...
>> Now, clearly, if early_ratio is high and late_ratio is very
>> low, the buffer is buffering more than it needs to; it will
>> skip a frame to reduce latency.
> ...
>
> Question:
> Do I understand it that I should not put every incoming packet through the
> jitter buffer?
>
> The way my code works today is:
>
> 1) Packet read from socket
> 2) Call speex_jitter_put(...) with the just-arrived packet
> 3) Read one packet from jitter buffer using speex_jitter_get(...) function
> 4) Feed just read-from-jitter packet to the sound card for playback
>
> This will in fact feed one 20msec batch of sound to play at the sound card
> for every packet received from the speex encoder at the other end.
>
> I know I may sound a bit slow-on-the-pickup here, but at the risk of
> sounding very beginner like (which I'll gladly admit I am) I wonder if this
> is totally wrong to do?
>
> Question:
> Should the jitter buffer implementation not have a packet to return (data is
> simply missing) should I bother to feed the 20msec packet of silence
> (comfort noise perhaps?) to the speaker? Or should the jitter buffer perhaps
> hint me (with a return value?) that no packet was available and there is no
> need to feed anything to the sound card?
>
> In my current implementation, running on a Windows XP box, I have a growing
> number of outstanding packets queued to the soundcard. I believe this is
> happening because when packets are delayed (in my test case I have no packet
> loss, just delays) the jitter buffer interpolates and returns a packet to
> play. When the packets finally arrive, they too are queued to the soundcard.
> Resulting in an increasing non-recoverable delay of the speech coming out of
> the sound card.
>
> Your feedback is greatly appreciated. I thank you for taking the time to
> respond with any relevant details or hints.

If you call speex_jitter_put and get from the same thread on every packet,
you are getting no benefit from the jitter buffer at all.

Correct usage of the jitter buffer is as follows:

Create two threads (or two evens inside the same thread):

Thread/Event #1: On every packet received, immediately and without delay
call speex_jitter_put.

Thread/Event #2: Every 20ms (use a damn-high-precision timer or
play-notification from your soundcard), call speex_jitter_get and play
that frame.

Meaning, even if no packet arrived in the last 20 ms, you should
still call speex_jitter_get, and no matter if 100 packets arrived in the
last 20 ms, you should still only call speex_jitter_get once.

(PS, if you do use threads, protect speex_jitter_put/get with a mutex
(CRITICAL_SECTION I believe they're called in Win32Speak) -- calling put
and get at the exact same time from different threads leads to "features")

This allows the jitter buffer to actually buffer, and try to keep the
delay between frame arrivel and frame play to a minimum.

As for detecting outages, a hack I use is to check jitter->valid_bits. If
it's set, we decoded "something", if it's not, we're interpolating
something which may not sound that good so feed the soundcard 20ms of
silence instead.

Incidentally, on Win32 you probably want to use DirectSound with a looping
buffer of around 200ms and play notifiers; try to be at least one frame
ahread of the write cursor -- the notification will come when the play
cursor is AT the position, meaning that you'd need to decode the packet in
no time (and I do mean "no time") to avoid artifacts in the sound. With
one frame to go on you should be safe.

_______________________________________________
Speex-dev mailing list
Speex-dev at xiph.org
http://lists.xiph.org/mailman/listinfo/speex-dev

Confidentiality:  The information in this e-mail is confidential and may be 
legally privileged. Access to this email by anyone other than the intended 
addressee(s) is unauthorized. If you are not the intended recipient of this 
message, any review, disclosure, copying, distribution, retention, or any 
action taken or omitted to be taken in reliance on it is prohibited and may 
be unlawful. If you are not the intended recipient, please reply to or 
forward a copy of this message to the sender and delete the message, 
any attachments, and any copies thereof from your system. 

Monitoring:  MeetingOne may monitor all incoming and outgoing emails in 
accordance with current legislation.  

Viruses:  Although MeetingOne has taken steps to ensure that this email 
and any attachments are free from viruses, we advise that in keeping 
with good computing practice the recipient should ensure they are 
actually virus free.