[Speex-dev] How does the jitter buffer "catch up"?

Jean-Marc Valin Jean-Marc.Valin at USherbrooke.ca
Thu Sep 22 16:56:19 PDT 2005

> On the first _get where there are no valid frames (because you stopped 
> transmitting from the other end), the jitter buffer will tell the decoder to 
> just decode the last frame again. On the next one, it tells the decoder to 
> extrapolate from the last frame, and on the next one after that to extrapolate 
> even more. This goes on until 25 packets are missed, at which point the jitter 
> buffer resets the decoder and stops extrapolating.

Yeah, I think I should use something else than just "packet is not
available" to trigger the reset.


> > I read Munble source code (v0.3.2) to see how you do. And I found this 
> > comment:
> > 	// Ideally, we'd like to go DTX (discontinous transmission)
> > 	// if we didn't detect speech. Unfortunately, the jitter
> > 	// buffer on the receiving end doesn't cope with that
> > 	// very well.
> Ah, this is a completely outdated comment, as I found a way to make it work 
> well :)
> What I do, is append one bit to each speex packet which indicates if this is a 
> "end of transmission". If it is, I manually tell the jitter buffer to reset 
> immediately and stop extrapolating, because I know no more packets will be 
> forthcoming.
> If this "end of transmission" packet should be lost, no harm is done, because 
> all that happens is that the codec extrapolates a bit, meaning you get a few 
> hundred ms of alien sounds :)
> In an ideal world, you'd like to use Speex DTX mode, which puts the decoder in 
> "generate comfort noise" mode and also transfers one packet every 400ms (I 
> think) to update the noise profile, but if you use the denoiser of the 
> preprocessor then comfort noise == silence.
> > I did not implemented the jitter buffer yet, but I wonder if I should?
> > I was thinking about holding the first few sound frames before playing them. 
> > That way, I introduce a delay, which should remove the jitter. Moreover, 
> > since I'm not transmitting when not speaking, the delay does not sum up to 
> > get pretty long in the end.
> This will work, but will introduce latency in your transmission. This sort of 
> buffering is very common in streaming media, such as shoutcasts and 
> videostreams, as they are unidirectional and it doesn't matter if there's a 2 
> second delay between sending and receiving time. For bidirectional speech, you 
> want latency at an absolute minimum.
> Why?
> Humans start speaking when the other side isn't speaking. Let's take the 
> extreme case and say there's 10 seconds of delay. If you both start talking at 
> the same time, it'll be 10 seconds before you hear the other end is also 
> talking, 10 more seconds to notice that he stopped, and then 10 seconds before 
> he hears you say "go ahead". 10 sec is extreme, but this effect is quite 
> noticable even at 500ms total latency.
> _______________________________________________
> Speex-dev mailing list
> Speex-dev at xiph.org
> http://lists.xiph.org/mailman/listinfo/speex-dev
Jean-Marc Valin <Jean-Marc.Valin at USherbrooke.ca>
Université de Sherbrooke

More information about the Speex-dev mailing list