[Speex-dev] How does the jitter buffer "catch up"?
Jean-Marc Valin
Jean-Marc.Valin at USherbrooke.ca
Thu Sep 22 16:56:19 PDT 2005
> On the first _get where there are no valid frames (because you stopped
> transmitting from the other end), the jitter buffer will tell the decoder to
> just decode the last frame again. On the next one, it tells the decoder to
> extrapolate from the last frame, and on the next one after that to extrapolate
> even more. This goes on until 25 packets are missed, at which point the jitter
> buffer resets the decoder and stops extrapolating.
Yeah, I think I should use something else than just "packet is not
available" to trigger the reset.
Jean-Marc
> > I read Munble source code (v0.3.2) to see how you do. And I found this
> > comment:
> > // Ideally, we'd like to go DTX (discontinous transmission)
> > // if we didn't detect speech. Unfortunately, the jitter
> > // buffer on the receiving end doesn't cope with that
> > // very well.
>
> Ah, this is a completely outdated comment, as I found a way to make it work
> well :)
>
> What I do, is append one bit to each speex packet which indicates if this is a
> "end of transmission". If it is, I manually tell the jitter buffer to reset
> immediately and stop extrapolating, because I know no more packets will be
> forthcoming.
>
> If this "end of transmission" packet should be lost, no harm is done, because
> all that happens is that the codec extrapolates a bit, meaning you get a few
> hundred ms of alien sounds :)
>
> In an ideal world, you'd like to use Speex DTX mode, which puts the decoder in
> "generate comfort noise" mode and also transfers one packet every 400ms (I
> think) to update the noise profile, but if you use the denoiser of the
> preprocessor then comfort noise == silence.
>
> > I did not implemented the jitter buffer yet, but I wonder if I should?
> > I was thinking about holding the first few sound frames before playing them.
> > That way, I introduce a delay, which should remove the jitter. Moreover,
> > since I'm not transmitting when not speaking, the delay does not sum up to
> > get pretty long in the end.
>
> This will work, but will introduce latency in your transmission. This sort of
> buffering is very common in streaming media, such as shoutcasts and
> videostreams, as they are unidirectional and it doesn't matter if there's a 2
> second delay between sending and receiving time. For bidirectional speech, you
> want latency at an absolute minimum.
>
> Why?
>
> Humans start speaking when the other side isn't speaking. Let's take the
> extreme case and say there's 10 seconds of delay. If you both start talking at
> the same time, it'll be 10 seconds before you hear the other end is also
> talking, 10 more seconds to notice that he stopped, and then 10 seconds before
> he hears you say "go ahead". 10 sec is extreme, but this effect is quite
> noticable even at 500ms total latency.
>
> _______________________________________________
> Speex-dev mailing list
> Speex-dev at xiph.org
> http://lists.xiph.org/mailman/listinfo/speex-dev
>
--
Jean-Marc Valin <Jean-Marc.Valin at USherbrooke.ca>
Université de Sherbrooke
More information about the Speex-dev
mailing list