[Speex-dev] Jitter buffer

Tue Nov 16 23:17:25 PST 2004

> Heh.  I guess after playing with different jitter buffers long enough,
> I've realized that there's always situations that you haven't properly
> accounted for when designing one.  

For example? :-)

> I think the only difficult part here that you do is dealing with
> multiple frames per packet, without that information being available
> to the jitter buffer.  If the jitter buffer can be told when a packet
> is added that the packet contains Xms of audio, then the jitter buffer
> won't have a problem handling this.

That's always a solution, but I'm not sure it's the best. In my current
implementation, the application doesn't even have to care about the fact
that there may (or may not) be more than one frame per packet.

> This is something I've encountered in trying to make a particular
> asterisk application handle properly IAX2 frames which contain either
> 20ms of 40ms of speex data.  For a CBR case, where the bitrate is
> known, this is fairly easy to do, especially if the frames _do_ always
> end on byte boundaries.  For a VBR case, it is more difficult, because
> it doesn't look like there's a way to just parse the speex bitstream
> and break it up into the constituent 20ms frames.

It would be possible, but unnecessarily messy. 

> The problem isn't so much that the jb can't return the right thing,
> but that internally it can't know if it just passed back a packet that
> contained 40ms of data or 20ms of data, so later it can't know if it's
> lost a frame or not.

exactly.

> The other things can be handled based on the return value of the _get
> method:  dropping frames, interpolating, etc.

I guess...

> I can see how you'd do that, but I don't think that would work for me.
> I really don't want the jitterbuffer to handle decoding at all,
> because in some cases, I want to dejitter the stream, but not decode
> it.

In that case, your callback can just send the encoded stream somewhere
else, it doesn't have to actually decode anything.

> For example, I will be running this in front of a conferencing
> application.  This conferencing application handles participants, each
> of which can use a different codec.   Often, we "optimize" the path
> through the conferencing application by passing the encoded stream
> straight-through to listeners when there is only one speaker, and the
> speaker and participant use the same codec(*).  In this case, I want
> to pass back the actual encoded frame, and also the information about
> what to do with it, so that I can pass along the frame to some
> participants, and decode (and possibly transcode) it for others.

It's another topic here, but why do you actually want to de-jitter the
stream if you're going to resend encoded. Why not just redirect the
packets as they arrive and let the last jitter buffer handle everything.
That'll be both simpler and better (slightly lower latency, slightly
less frame dropping/interpolation).

	Jean-Marc