[Speex-dev] Jitter buffer

Mon Nov 15 22:29:01 PST 2004

> OK, I'm actually about ready to start working on this now.
> 
> If people in the speex community are interested in working with me on
> this, I can probably start with the speex buffer, but I imagine
> there's going to be a lot more work needed to get this where I'd like
> it to go.

And where would you like it to go? ;-)

> At the API level, It seems pretty easy to make the speex
> implementation become speex-independent.  Instead of having
> speex_jitter_get call any particular speex_decode or speex_read_bits,
> etc functions, it could instead just return the "thing" it got, and a
> flag.  I.e. 

It's not as simple as it may look -- otherwise that's what I would have
done. These are some of the things that you can't do easily if you "just
return the thing":
- Allow more than one frame per packet, especially if the frames don't
end on a byte boundary
- Let the jitter buffer drop/interpolate frames during silence periods
- Anything that requires the jitter buffer to know about what is being
decoded.

> We could then have a second-level API wrapper around this for speex
> which would then call speex_decode, etc, as necessary.
> 
> Basically, I'd like the jitterbuffer to do all the history, length,
> etc calculations, but not actually know anything about what it's
> managing.

I would suggest the opposite. You can just think of the current
implementation as being callback-based. If you look at the
implementation of speex_decode, it merely looks for a callback function
in a struct (mode definition). It would not be very hard to provide
similar (callback structures) wrappers for other codecs. I'm willing to
modify the current implementation to make that easier (though it's
already not very hard).

> In asterisk and iaxclient (my project), the things I'd pass into the
> jitterbuffer would be different kinds of structures.  Some of these
> may be audio frames, some might be control frames (DTMF, etc) that we
> want synchronized with the audio stream, etc.  In the future, we'd
> also want to have video frames thrown in there, which would need to be
> synchronized. 

I'm not sure of the best way to do that. Audio has different constraints
as video when you're doing jitter buffering. For example, it's much
easier (in terms of perceptual degradation) to skip frames with video
than with audio, which means that the algorithm to handle that optimally
may be quite different. Don't you think?

> So, I guess my questions (for Jean-Marc mostly, but others as well):
> 
> 1) Is it OK with you to add this extra abstraction layer to your
> jitter buffer?

I think there might be better ways to abstract the codec out of that
(callbacks and all).

> 2) What's the state of the current implementation: (does it work?)

As of 1.1.6, the jitter buffer actually works and I've been able to get
good results with it.

> 3) Is there a paper or something that you're using in your design that
> I can read.  

Sorry, it just came out of my head. It's probably similar to what others
are doing, but I "invented" it independently.

> 4) Are people interested in collaborating and contributing to this.

I am.

	Jean-Marc