[Speex-dev] Backup Echo Suppression
zmorris at mac.com
zmorris at mac.com
Mon Jul 2 19:41:20 PDT 2007
On Jul 2, 2007, at 7:34 PM, Jean-Marc Valin wrote:
> Selon "Coffey, Michael" <mcoffey at avistar.com>:
>> Believe me; I've "played with" priorities and buffering.
>
> Then either you haven't played well enough or you're using a
> braindead OS.
This is sort of what I was talking about with nibbling. Imagine you
have a microphone sampling at 128 samples at a time, filling a 256
byte buffer, and you have a player that writes 256 samples at a time,
or 512 bytes. You have to nibble a frame every 160 samples, so you
get this, where each digit represents 32 samples, so 00000 is 160
samples:
0 1 2 3 4 5 6 7 <- timestamp when each
speex frame is read
0000111122223333444455556666777788889999 <- input frame
0000011111222223333344444555556666677777 <- speex frame
00000
11111
22222
33333
44444
55555
66666
77777
0000011111222223333344444555556666677777 <- speex frame
0000000011111111222222223333333344444444 <- output frame
0 2 4 5 7 <- timestamp when each speex
frame is written
1 3 6
00000
11111
22222
33333
44444
55555
66666
77777
I've shown the points in time when an input buffer can be passed into
a speex frame, or a speex frame can be passed into an output buffer.
The echo canceler can't assume that each input/output pair are going
to arrive perfectly synced and at the same time. Due to threading
delays and other issues, it could easily get 2 inputs and 1 output
briefly, or vice versa.
I THINK that looking at this from a high level, the echo canceler IS
guaranteed to get an input frame for every output frame, as long as
it doesn't look at the frame's timestamp. Perhaps internally it has
a queue that can save up frames until it has both an input and an
output frame. In that case, it needs to stop writing warnings about
extra or missing frames to the console, which seems to happen every
time I run.
But if the echo canceler IS using each frame's timestamp when it's
trying to converge, it's almost guaranteed to fail on most operating
systems, because the timestamp has such a high variability between
frames, and can even sometimes be 0 for the output buffer in this
example.
Also, I think that many machines have separate input/output hardware
that can suffer from clock drift. I'd really like to see an echo
canceler that can work even when input/output frames are fed in with
a large random time delta. I should be able to skip the first few
input or output frames, and the echo canceller should be able to find
out what the time delta is, and know from that point on, it will be
relatively constant between any given pair of input/output frames.
The easiest way to do this might be to look at the maximum of the
covariance of the input/output, or find the phase offset of the input
and output FFTs. Maybe it already does this, and someone can say if so?
P.S. The above situation is almost exactly what happens on my Mac,
and would be exacerbated by people with third party sound cards.
------------------------------------------------------------------------
Zack Morris Z Sculpt Entertainment This Space
zmorris at zsculpt.com http://www.zsculpt.com For Rent
------------------------------------------------------------------------
If the doors of perception were cleansed, everything would appear to man
as it is, infinite. -William Blake, The Marriage of Heaven and Hell
More information about the Speex-dev
mailing list