[Speex-dev] Jitter buffer

Wed Nov 17 07:54:55 PST 2004

Jean-Marc Valin wrote:

>>Heh.  I guess after playing with different jitter buffers long enough,
>>I've realized that there's always situations that you haven't properly
>>accounted for when designing one.  
>>    
>>
>
>For example? :-)
>  
>
I have a bunch of examples listed on the wiki page where I had written 
initial specifications:

http://www.voip-info.org/tiki-index.php?page=Asterisk%20new%20jitterbuffer

In particular, (I'm not really sure, because I don't thorougly 
understand it yet) I don't think your jitterbuffer handles:

DTX: discontinuous transmission.
clock skew: (see discussion, though)
shrink buffer length quickly during silence

>>I think the only difficult part here that you do is dealing with
>>multiple frames per packet, without that information being available
>>to the jitter buffer.  If the jitter buffer can be told when a packet
>>is added that the packet contains Xms of audio, then the jitter buffer
>>won't have a problem handling this.
>>    
>>
>
>That's always a solution, but I'm not sure it's the best. In my current
>implementation, the application doesn't even have to care about the fact
>that there may (or may not) be more than one frame per packet.
>  
>
That may be OK when the jitterbuffer is only used right before the audio 
layer, but I'm still not sure how I can integrate this functionality in 
the places I want to put the jitterbuffer.

>  
>
>>This is something I've encountered in trying to make a particular
>>asterisk application handle properly IAX2 frames which contain either
>>20ms of 40ms of speex data.  For a CBR case, where the bitrate is
>>known, this is fairly easy to do, especially if the frames _do_ always
>>end on byte boundaries.  For a VBR case, it is more difficult, because
>>it doesn't look like there's a way to just parse the speex bitstream
>>and break it up into the constituent 20ms frames.
>>    
>>
>
>It would be possible, but unnecessarily messy.
>  
>

I looked at nb_celp.c, and it seems that it would be pretty messy.  I'd 
need to implement a lot of the actual codec just to be able to determine 
the number of frames in a packet.

I think the easiest thing for me is to just stick to one frame per 
"thing" as far as the jitterbuffer is concerned, and then handle 
additional framing for packets at a higher level.

Even if we use the "terminator" submode (i.e.     
speex_bits_pack(&encstate->bits, 15, 5); ), it seems hard to find that 
in the bitstream, no?

>>For example, I will be running this in front of a conferencing
>>application.  This conferencing application handles participants, each
>>of which can use a different codec.   Often, we "optimize" the path
>>through the conferencing application by passing the encoded stream
>>straight-through to listeners when there is only one speaker, and the
>>speaker and participant use the same codec(*).  In this case, I want
>>to pass back the actual encoded frame, and also the information about
>>what to do with it, so that I can pass along the frame to some
>>participants, and decode (and possibly transcode) it for others.
>>    
>>
>
>It's another topic here, but why do you actually want to de-jitter the
>stream if you're going to resend encoded. Why not just redirect the
>packets as they arrive and let the last jitter buffer handle everything.
>That'll be both simpler and better (slightly lower latency, slightly
>less frame dropping/interpolation).
>  
>

Because we need to synchronize multiple speakers in the conference:  On 
the incoming side, each incoming "stream" has it's own timebase and 
timestamps, and jitter.  If we just passed that through (even if we 
adjusted the timebases), the different jitter characteristics of each 
speaker would create chaos for listeners, and they'd end up with 
overlapping frames, etc..

-SteveK

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.xiph.org/pipermail/speex-dev/attachments/20041117/eaa15f34/attachment.htm