[Speex-dev] Backup Echo Suppression

Mon Jul 2 20:48:24 PDT 2007

Selon zmorris at mac.com:
> This is sort of what I was talking about with nibbling.  Imagine you
> have a microphone sampling at 128 samples at a time, filling a 256
> byte buffer, and you have a player that writes 256 samples at a time,
> or 512 bytes.  You have to nibble a frame every 160 samples, so you
> get this, where each digit represents 32 samples, so 00000 is 160
> samples:

...
> I've shown the points in time when an input buffer can be passed into
> a speex frame, or a speex frame can be passed into an output buffer.
>
> The echo canceler can't assume that each input/output pair are going
> to arrive perfectly synced and at the same time.  Due to threading
> delays and other issues, it could easily get 2 inputs and 1 output
> briefly, or vice versa.

As long as the capture and playback clocks are in sync, there's no problem. If
the frame sizes don't match, you'll just need to do a bit of buffering. No big
deal. The only requirement is that the first playback sample you send the AEC
has to arrive as echo in the capture with a fixed delay (that isn't too large
compared to the tail length).

> I THINK that looking at this from a high level, the echo canceler IS
> guaranteed to get an input frame for every output frame, as long as
> it doesn't look at the frame's timestamp.  Perhaps internally it has
> a queue that can save up frames until it has both an input and an
> output frame.  In that case, it needs to stop writing warnings about
> extra or missing frames to the console, which seems to happen every
> time I run.

One of those warnings is OK. If you get many, something's wrong.

> But if the echo canceler IS using each frame's timestamp when it's
> trying to converge, it's almost guaranteed to fail on most operating
> systems, because the timestamp has such a high variability between
> frames, and can even sometimes be 0 for the output buffer in this
> example.

Don't know what you mean about timestamps. the AEC doesn't use/need timestamps.
But it does require you send the audio in the same order you capture/play it.

> Also, I think that many machines have separate input/output hardware
> that can suffer from clock drift.  I'd really like to see an echo
> canceler that can work even when input/output frames are fed in with
> a large random time delta.  I should be able to skip the first few
> input or output frames, and the echo canceller should be able to find
> out what the time delta is, and know from that point on, it will be
> relatively constant between any given pair of input/output frames.

This is a lot harder than you may think. estimating the drift accurately enough
is highly non-trivial. It's much easier to make sure the clocks are in sync
(e.g. tell the user to use the same card for both).

> The easiest way to do this might be to look at the maximum of the
> covariance of the input/output, or find the phase offset of the input
> and output FFTs.  Maybe it already does this, and someone can say if so?

If you think it's easy, then I guess I'll be waiting for your patch...

> P.S. The above situation is almost exactly what happens on my Mac,
> and would be exacerbated by people with third party sound cards.

You mean Apple can't ship a soundcard that records and plays at the same rate? I
have a hard time believing that.

   Jean-Marc