[Speex-dev] New jitter.c, bug in speex_jitter_get?

Jean-Marc Valin jean-marc.valin at usherbrooke.ca
Wed May 3 20:21:18 PDT 2006

> It depends on the protocol.  RTP packets, though, don't just have the  
> audio payload and a length;  they also have the timestamp, the  
> payload type, etc.  Some RTP packets may be audio data, some may be  
> video, some may be DTMF digits, etc.

Timestamp is already supported. Different payload types should require
different jitter buffers (one jitter buffer per payload). I'm planning
on supporting synchronization between jitter buffers, but haven't done
it yet. I first need to have a better understanding of how it needs to
be done when the timestamp units and offsets are different.

> It's not brain surgery, but you'd generally parse these into some  
> kind of structure, even if the structure is just a mapping onto a  
> buffer.  Anyway, the point is that if you want to an abstraction  
> above the jitterbuffer, it makes sense that the jitterbuffer would  
> treat it's payload as opaque.  That would mean it can't just throw it  
> away.  Then if it wants to destroy it, you could either allow the  
> jitterbuffer to callback to an application-passed destroy function,  
> or have it return it with a flag of some kind.

I'm willing to consider the destroy function (callback) method. But it
still means the jitter buffer can destroy stuff with telling the
application (which is I think reasonable to expect).

> Hmm, I just had this discussion here:  http://bugs.digium.com/ 
> view.php?id=6011  Here's the case where you would want it:
> The theory behind doing that is that in general, you want control to be
> synchronized with voice.  Imagine a situation where you are leaving a
> voicemail, and you have a big (say 1500ms) jitterbuffer;  The VM system
> lets you press "#" to end the recording.   So you say "I'll pay you A
> million dollars; just kidding!", and press "#";  If the system acted on
> the DTMF immediately, your message might be seriously misinterpreted.

I still think you need a different type of buffering for stuff like
DTMF. You want to reorder and maybe synchronize, but that's it. Plus I'd
say sending DTMF over UDP is a bad idea in the first place.

> The same concept holds true for processing HANGUP frames, etc.

Same. If you send the HANGUP over UDP, what do you do if it goes

> I don't know what is done by other applications, and buffering  
> control frames, etc certainly leads to complication.

Which is why you need to treat them separately and not attempt to make
them fit in a framework they don't belong to.

> > Why would you parse and do work *before* putting it in the jitter
> > buffer, especially not even knowing whether you'll actually use it.
> In the case of IAX2, there's lots of reasons:
> 1) You need to determine which call it belongs to.

No big deal if you still send the bytes...

> 2) If it's a reliably-sent frame, you acknowledge it immediately.

See above.

> 3) Some frame types are not buffered.

Then why do you use a jitter buffer on them?

> Time is just what drives the output of the jitterbuffer.  Time could  
> be determined by the number of samples needed to keep a soundcard  
> full, or it could be time according to the system clock, if you're  
> not (or not able to) synchronize to a soundcard.
> I don't think it's very different from your concept of ticks.  I use  
> actual measures of time, because the implementation of my  
> jitterbuffer sometimes makes decisions where knowing what time means  
> helps.

I just think that creates confusion. What do you do if the user sends
the system clock and the soundcard is slightly out of sync with that
one? Also, the only reason I now use ticks (the previous version didn't)
is that it gives me slightly finer time estimates (for adjusting the
buffer delay) in case the audio (or whatever data) frames are bigger
than the soundcard period. The jitter buffer doesn't even have an
explicit "buffering time" value in it.

> Again, I don't think the API prohibits the use that you've  
> described.  You can call jb_put() with that overlapping data.  My  
> _implementation_ may not like it, but the API can support it.

Does it have a way to tell you that "OK, you asked data for timestamps
60-80ms, but I'm giving you a packet that spans 70-90"... or anything
that could have overlaps and/or holes compared to what you're asking?

> See above.  I'm not married to the idea of buffering control frames,  
> but there's a
> valid argument to be made for it.

I guess it truly depends on the meaning of the "control frames". If they
are "rendered" (e.g. DTMF), they might be representable with timestamps
and durations (e.g. the duration of the tone), so they *may* fit in (as
long as you consider they may be lost). I really don't see stuff like
hangup going into a jitter buffer. You may want to sync it with the
rest, but that's it. 

> There may not always be a single "right" behavior, because the system  
> is always trying to predict future events.  But there's definitely  
> "wrong" behaviors that can easily be tested objectively.

By "right", I mean "does the get() return roughly the right stuff
without too many gaps?" or "how big is the latency?". Of course, "does
the jitter buffer segfault?" is also good to test :-)


More information about the Speex-dev mailing list