[Speex-dev] How does the jitter buffer "catch up"?

Thorvald Natvig speex at natvig.com
Sun Sep 18 19:02:58 PDT 2005

>> Err, unless I'm totally wrong, there are a few race conditions.
>> Assume the buffer is full of packets newer than the current pointer, and
>> one that is at the current pointer.
>> get and put start at the same time.
>> get will find the correct buffer index. Now, just after it finds it's
>> index, assume we switch to the put thread.
>> Put needs to put a new packet in, so discards the oldest, which is the
>> current packet, and replaces it with the new one (let's call it newest).
> [...]
> True. It was originally race-safe when I was discarding the incoming
> packet in case of buffer full. I changed that behaviour exactly because
> of burst issues and the like. Other than that, I don't think there
> should be any other possible race (assuming CPU cache coherence).

Weeeeelll.. Actually, now that you mention it, the histogram shifting is a 
race as well.

In _put, the lines

          jitter->shortterm_margin[i] *= .98;
          jitter->longterm_margin[i] *= .995;
          jitter->shortterm_margin[int_margin] += .02;
          jitter->longterm_margin[int_margin] += .005;

are read-accumulate-write operations.

Let's assume the +40 bin has the shorterm_margin 1.0, and _put just read 
it to do a multiply.

At this point, _get shifts the histogram, so the 1.0 value is copied to 
the +20ms bin.

_put gets the CPU again, and writes 0.98 (the 1.0 it fetched, multiplied 
by 0.98) back to the +40 bin.

The sum of margins is now 1.98, which is Not Good (tm); it'll first shift 
it once more (ontime = 0.9x, +20ms = 0.9x), then shift again, (late =0.9x, 
ontime = 0.9x).. Eeek, packets are late, interpolate.. then they're early 
and so on. This will continue until the margins return to sane values.

The chance of this happening is much less than the other one, but it COULD 
happen ;)

>>> Why would you do that. The idea of interpolating a frame is exactly to
>>> get better quality than just putting zeros.
>> Actually, I oversimplified a bit. I check if valid_bits has been zero for
>> the last 4 frames or more, because once you interpolate more than 100ms
>> from the last known state, you end up with some weird
>> blipp-blopp-blooiiing sound. Actually it reminds me of the ambient sound
>> of weird aliens in bad 50s scifi movies. At that point, silence is much
>> better :)
> Then it would be a problem with the packet loss concealment. It's
> actually decreasing the level of the interpolated audio with time. Maybe
> it's not decreasing quickly enough?

I'd say so, my testers kept asking me what planet my room-mate is from :) 
Note that this only happens when you have quite a bit of interpolation, 
meaning there is serious network trouble anyway; don't sacrifice the 
quality of one or two-frame interpolation (which happens quite 
frequently) for these extreme cases.

PS: Regarding the earlier stuff about DTX using the VAD from the 
preprocessor, I ended up switching back to silence during 
non-transmission. The denoiser is good enough that people expect silence 
when others aren't talking, the comfort noise was unwelcome.

More information about the Speex-dev mailing list