<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

<head>

  <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">

  <title></title>

</head>

<body bgcolor="#ffffff" text="#000000">

Jean-Marc Valin wrote:

<blockquote cite="mid1100675845.4495.11.camel@localhost" type="cite">

  <blockquote type="cite">

    <pre wrap="">Heh.  I guess after playing with different jitter buffers long enough,

I've realized that there's always situations that you haven't properly

accounted for when designing one.  

    </pre>

  </blockquote>

  <pre wrap=""><!---->

For example? :-)

  </pre>

</blockquote>

I have a bunch of examples listed on the wiki page where I had written

initial specifications:<br>

<br>

<a class="moz-txt-link-freetext" href="http://www.voip-info.org/tiki-index.php?page=Asterisk%20new%20jitterbuffer">http://www.voip-info.org/tiki-index.php?page=Asterisk%20new%20jitterbuffer</a><br>

<br>

In particular, (I'm not really sure, because I don't thorougly

understand it yet) I don't think your jitterbuffer handles:<br>

<br>

DTX: discontinuous transmission.<br>

clock skew: (see discussion, though)<br>

shrink buffer length quickly during silence<br>

<br>

<blockquote cite="mid1100675845.4495.11.camel@localhost" type="cite">

  <blockquote type="cite">

    <pre wrap="">I think the only difficult part here that you do is dealing with

multiple frames per packet, without that information being available

to the jitter buffer.  If the jitter buffer can be told when a packet

is added that the packet contains Xms of audio, then the jitter buffer

won't have a problem handling this.

    </pre>

  </blockquote>

  <pre wrap=""><!---->

That's always a solution, but I'm not sure it's the best. In my current

implementation, the application doesn't even have to care about the fact

that there may (or may not) be more than one frame per packet.

  </pre>

</blockquote>

That may be OK when the jitterbuffer is only used right before the

audio layer, but I'm still not sure how I can integrate this

functionality in the places I want to put the jitterbuffer.<br>

<br>

<blockquote cite="mid1100675845.4495.11.camel@localhost" type="cite">

  <pre wrap="">

  </pre>

  <blockquote type="cite">

    <pre wrap="">This is something I've encountered in trying to make a particular

asterisk application handle properly IAX2 frames which contain either

20ms of 40ms of speex data.  For a CBR case, where the bitrate is

known, this is fairly easy to do, especially if the frames _do_ always

end on byte boundaries.  For a VBR case, it is more difficult, because

it doesn't look like there's a way to just parse the speex bitstream

and break it up into the constituent 20ms frames.

    </pre>

  </blockquote>

  <pre wrap=""><!---->

It would be possible, but unnecessarily messy.

  </pre>

</blockquote>

<br>

I looked at nb_celp.c, and it seems that it would be pretty messy.&nbsp; I'd

need to implement a lot of the actual codec just to be able to

determine the number of frames in a packet.<br>

<br>

I think the easiest thing for me is to just stick to one frame per

"thing" as far as the jitterbuffer is concerned, and then handle

additional framing for packets at a higher level.<br>

<br>

Even if we use the "terminator" submode (i.e. &nbsp;&nbsp;&nbsp;

speex_bits_pack(&amp;encstate-&gt;bits, 15, 5); ), it seems hard to

find that in the bitstream, no?<br>

<br>

<blockquote cite="mid1100675845.4495.11.camel@localhost" type="cite">

  <pre wrap=""></pre>

  <blockquote type="cite">

    <pre wrap="">For example, I will be running this in front of a conferencing

application.  This conferencing application handles participants, each

of which can use a different codec.   Often, we "optimize" the path

through the conferencing application by passing the encoded stream

straight-through to listeners when there is only one speaker, and the

speaker and participant use the same codec(*).  In this case, I want

to pass back the actual encoded frame, and also the information about

what to do with it, so that I can pass along the frame to some

participants, and decode (and possibly transcode) it for others.

    </pre>

  </blockquote>

  <pre wrap=""><!---->

It's another topic here, but why do you actually want to de-jitter the

stream if you're going to resend encoded. Why not just redirect the

packets as they arrive and let the last jitter buffer handle everything.

That'll be both simpler and better (slightly lower latency, slightly

less frame dropping/interpolation).

  </pre>

</blockquote>

<br>

Because we need to synchronize multiple speakers in the conference:&nbsp; On

the incoming side, each incoming "stream" has it's own timebase and

timestamps, and jitter.&nbsp; If we just passed that through (even if we

adjusted the timebases), the different jitter characteristics of each

speaker would create chaos for listeners, and they'd end up with

overlapping frames, etc..<br>

<br>

-SteveK<br>

<br>

<br>

</body>

</html>