[Speex-dev] How does the jitter buffer "catch up"?
Jean-Marc.Valin at USherbrooke.ca
Sun Sep 18 16:21:23 PDT 2005
> FYI: The below is just my interpretation of the code, I might be wrong.
Most of it is right. Actually, would you mind if I use part of your
email for documenting the jitter buffer in the manual?
> Each time a new packet arrives, the jitter buffer calculates how far ahead
> or behind the "current" timestamp it is; this is called arrival_margin.
> The "current" timestamp is simply the last frame successfully decoded.
Minor detail, it's the last played (whether it was successfully decoded
> It maintains a list of bins for margins, this is short and longterm
> Think of the bins like this:
> -60ms -40ms -20ms 0ms +20ms +40ms +60ms
> when a packet arrives, the margin matching it's arrivel_margin is
> increased, so if this packet was 40ms after the current timestamp, the
> 40ms bin would be increased. If this packet arrived 60ms too late (and
> hence is useless), the -60ms bin would increase.
> early_ratio_XX is the sum of all the positive bins.
> late_ratio_XX is the sum of all the negative bins.
Right. And only the packets that are "just in time" don't get counted in
> The difference between _long and _short is just how fast they change.
> If a packet has timestamp outside the bins, it's not used for calculation.
> Now, clearly, if early_ratio is high and late_ratio is very low, the
> buffer is buffering more than it needs to; it will skip a frame to reduce
> latency. Alternately, if late_ratio is even marginally above 0, more
> buffering is needed, and it duplicates a frame. This decision is done when
> Depending on your chosen transmission method, during network hiccups
> you'll either have lost packets or they'll come in a burst when the
> network conditions restore themselves. In either case, after missing 20
> packets or so the jitter buffer will prepare to "reset", and it's new
> current timestamp will be the timestamp on whatever packet arrives. It
> will also hold decoding until at least buffer_size frames have arrived.
Right, except it will only actually reset when receiving the first new
> Since it sounds like you're using reliable transmission (packets are not
> lost), what will happen is that there's a whole stream of packets suddenly
> arriving, and they'll fill up the buffer much much faster than it's
> emptied. In fact, you're likely to fill it so fast the buffer runs out of
> room, meaning the first few packets gets dropped to make room for the
> later ones. However, as the current timestamp was set to the first
> arriving packet, the decoder won't find the packet it's looking for,
> meaning the jitter buffer will soon reset again.
I'm not sure here what will happen. Normally, you'd want to make the
buffer larger than what you expect to have in it. In that case, the
jitter buffer would likely drop frames until it catches up.
> So no, it doesn't "catch up", it tries to keep latency to an absolute
> minimum whatever the circumstances, so most of the late frames will be
Yes. Actually, the best way to handle that would be to (eventually)
change the code to drop frames in silence or low-energy periods.
> To achieve the effect you're describing, you'd need to increase
> SPEEX_JITTER_MAX_BUFFER_SIZE to the longest delay you're expecting, and
> then inside the block on line 231 (which says)
> if (late_ratio_short + ontime_ratio_short < .005 && late_ratio_long +
> ontime_ratio_long < .01 && early_ratio_short > .8)
> .. add something that multiplies all the magins with 0.75 or so at the
> end. This will force the jitter buffer to only skip 1 frame at a time and
> wait a bit before it skips the next one.
Don't think it's necessary since there's already some code that shifts
the histogram whenever I skip or interpolate a packet. This means that
if the packets are on average 20 ms in advance when we drop a frame,
then they will be considered all "on time" (0 ms) after that.
Jean-Marc Valin <Jean-Marc.Valin at USherbrooke.ca>
Université de Sherbrooke
More information about the Speex-dev