<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
<title></title>
</head>
<body bgcolor="#ffffff" text="#000000">
Jean-Marc Valin wrote:
<blockquote cite="mid1100675845.4495.11.camel@localhost" type="cite">
<blockquote type="cite">
<pre wrap="">Heh. I guess after playing with different jitter buffers long enough,
I've realized that there's always situations that you haven't properly
accounted for when designing one.
</pre>
</blockquote>
<pre wrap=""><!---->
For example? :-)
</pre>
</blockquote>
I have a bunch of examples listed on the wiki page where I had written
initial specifications:<br>
<br>
<a class="moz-txt-link-freetext" href="http://www.voip-info.org/tiki-index.php?page=Asterisk%20new%20jitterbuffer">http://www.voip-info.org/tiki-index.php?page=Asterisk%20new%20jitterbuffer</a><br>
<br>
In particular, (I'm not really sure, because I don't thorougly
understand it yet) I don't think your jitterbuffer handles:<br>
<br>
DTX: discontinuous transmission.<br>
clock skew: (see discussion, though)<br>
shrink buffer length quickly during silence<br>
<br>
<blockquote cite="mid1100675845.4495.11.camel@localhost" type="cite">
<blockquote type="cite">
<pre wrap="">I think the only difficult part here that you do is dealing with
multiple frames per packet, without that information being available
to the jitter buffer. If the jitter buffer can be told when a packet
is added that the packet contains Xms of audio, then the jitter buffer
won't have a problem handling this.
</pre>
</blockquote>
<pre wrap=""><!---->
That's always a solution, but I'm not sure it's the best. In my current
implementation, the application doesn't even have to care about the fact
that there may (or may not) be more than one frame per packet.
</pre>
</blockquote>
That may be OK when the jitterbuffer is only used right before the
audio layer, but I'm still not sure how I can integrate this
functionality in the places I want to put the jitterbuffer.<br>
<br>
<blockquote cite="mid1100675845.4495.11.camel@localhost" type="cite">
<pre wrap="">
</pre>
<blockquote type="cite">
<pre wrap="">This is something I've encountered in trying to make a particular
asterisk application handle properly IAX2 frames which contain either
20ms of 40ms of speex data. For a CBR case, where the bitrate is
known, this is fairly easy to do, especially if the frames _do_ always
end on byte boundaries. For a VBR case, it is more difficult, because
it doesn't look like there's a way to just parse the speex bitstream
and break it up into the constituent 20ms frames.
</pre>
</blockquote>
<pre wrap=""><!---->
It would be possible, but unnecessarily messy.
</pre>
</blockquote>
<br>
I looked at nb_celp.c, and it seems that it would be pretty messy. I'd
need to implement a lot of the actual codec just to be able to
determine the number of frames in a packet.<br>
<br>
I think the easiest thing for me is to just stick to one frame per
"thing" as far as the jitterbuffer is concerned, and then handle
additional framing for packets at a higher level.<br>
<br>
Even if we use the "terminator" submode (i.e.
speex_bits_pack(&encstate->bits, 15, 5); ), it seems hard to
find that in the bitstream, no?<br>
<br>
<blockquote cite="mid1100675845.4495.11.camel@localhost" type="cite">
<pre wrap=""></pre>
<blockquote type="cite">
<pre wrap="">For example, I will be running this in front of a conferencing
application. This conferencing application handles participants, each
of which can use a different codec. Often, we "optimize" the path
through the conferencing application by passing the encoded stream
straight-through to listeners when there is only one speaker, and the
speaker and participant use the same codec(*). In this case, I want
to pass back the actual encoded frame, and also the information about
what to do with it, so that I can pass along the frame to some
participants, and decode (and possibly transcode) it for others.
</pre>
</blockquote>
<pre wrap=""><!---->
It's another topic here, but why do you actually want to de-jitter the
stream if you're going to resend encoded. Why not just redirect the
packets as they arrive and let the last jitter buffer handle everything.
That'll be both simpler and better (slightly lower latency, slightly
less frame dropping/interpolation).
</pre>
</blockquote>
<br>
Because we need to synchronize multiple speakers in the conference: On
the incoming side, each incoming "stream" has it's own timebase and
timestamps, and jitter. If we just passed that through (even if we
adjusted the timebases), the different jitter characteristics of each
speaker would create chaos for listeners, and they'd end up with
overlapping frames, etc..<br>
<br>
-SteveK<br>
<br>
<br>
</body>
</html>