[Speex-dev] Backup Echo Suppression
Jean-Marc Valin
Jean-Marc.Valin at USherbrooke.ca
Mon Jul 2 20:48:24 PDT 2007
Selon zmorris at mac.com:
> This is sort of what I was talking about with nibbling. Imagine you
> have a microphone sampling at 128 samples at a time, filling a 256
> byte buffer, and you have a player that writes 256 samples at a time,
> or 512 bytes. You have to nibble a frame every 160 samples, so you
> get this, where each digit represents 32 samples, so 00000 is 160
> samples:
...
> I've shown the points in time when an input buffer can be passed into
> a speex frame, or a speex frame can be passed into an output buffer.
>
> The echo canceler can't assume that each input/output pair are going
> to arrive perfectly synced and at the same time. Due to threading
> delays and other issues, it could easily get 2 inputs and 1 output
> briefly, or vice versa.
As long as the capture and playback clocks are in sync, there's no problem. If
the frame sizes don't match, you'll just need to do a bit of buffering. No big
deal. The only requirement is that the first playback sample you send the AEC
has to arrive as echo in the capture with a fixed delay (that isn't too large
compared to the tail length).
> I THINK that looking at this from a high level, the echo canceler IS
> guaranteed to get an input frame for every output frame, as long as
> it doesn't look at the frame's timestamp. Perhaps internally it has
> a queue that can save up frames until it has both an input and an
> output frame. In that case, it needs to stop writing warnings about
> extra or missing frames to the console, which seems to happen every
> time I run.
One of those warnings is OK. If you get many, something's wrong.
> But if the echo canceler IS using each frame's timestamp when it's
> trying to converge, it's almost guaranteed to fail on most operating
> systems, because the timestamp has such a high variability between
> frames, and can even sometimes be 0 for the output buffer in this
> example.
Don't know what you mean about timestamps. the AEC doesn't use/need timestamps.
But it does require you send the audio in the same order you capture/play it.
> Also, I think that many machines have separate input/output hardware
> that can suffer from clock drift. I'd really like to see an echo
> canceler that can work even when input/output frames are fed in with
> a large random time delta. I should be able to skip the first few
> input or output frames, and the echo canceller should be able to find
> out what the time delta is, and know from that point on, it will be
> relatively constant between any given pair of input/output frames.
This is a lot harder than you may think. estimating the drift accurately enough
is highly non-trivial. It's much easier to make sure the clocks are in sync
(e.g. tell the user to use the same card for both).
> The easiest way to do this might be to look at the maximum of the
> covariance of the input/output, or find the phase offset of the input
> and output FFTs. Maybe it already does this, and someone can say if so?
If you think it's easy, then I guess I'll be waiting for your patch...
> P.S. The above situation is almost exactly what happens on my Mac,
> and would be exacerbated by people with third party sound cards.
You mean Apple can't ship a soundcard that records and plays at the same rate? I
have a hard time believing that.
Jean-Marc
More information about the Speex-dev
mailing list