[Speex-dev] mdf -- better adaption of W?
Thorvald Natvig
speex at natvig.com
Mon Dec 12 20:52:16 PST 2005
>> Generate a test signal (10+x sine waves per frame), where x increases by
>> one for each iteration, and wraps around at 100.
>
> Testing with sine waves is usually not a good idea. If you intend on
> cancelling speech, then test with speech.
Ok, I tested more extensively with both music and two-way speech. More on
this below.
>> However, when peeking at the values, it seems that the weights for
>> frame 0 (newest) are very low.
>
> Peeking at the value tells you nothing unless you do the inverse FFT and
> all so you can see them in the time domain. Even then, it's not that
> useful.
Actually, computing the "power spectrum" for each frame of W shows
how large an ammount of the original signal at time offset j the
echo canceller thinks should be removed from the current input frame.
If you compute W*X for each j and ifft, you'll get the
original signal with each frequency component scaled and time-shifted
according to what W was (for that j).
Anyway, I did some proper testing. I took my headset, bent the microphone
arm so it's resting inside the .. uh.. whatever you call that large
muffler thing that goes around your ear. This is an important testcase, as
a lot of our users have complained about hearing echo that is propagated
at the remote end either directly though the air from the "speaker" to the
microphone (common with open headsets), and with closed headsets we see
echo propagated mechanically down the arm of the microphone.
Playing regular pop music (Garbage: Push It), things work out well, and
the canceller ends up with semi-stable weights, almost entirely in the
(j==M-1) bin (0-20ms delay, which is quite natural). It's the same with
normal speech as long as it's spoken reasonably fast.
I see some "banding" of the output, it seems there's more output signal
(and more to cancel) in the 1-3khz and 5-6 khz area, but I blame that on
the headphones; they're cheap.
However, when switching to AC DC: Big Gun, we see and hear a large
residual echo from the opening el-guitar. This seems to be a result of a
semi-stable sound that lasts more than 20 ms; the canceller finds a
correlation in 4-5 timebins instead of just one. We could reproduce the
same result by playing a human voice saying "aaaaaaaaaa" without variation
in pitch; the weights for those frequency bins would increase for all the
timeslots in W.
Now, people don't say "aaaaaaaaaaaaa" all that often, but they do play
music that has a few "long" sounds, and saying "aaaaanyway" is enough to
trigger this.
Next test, what happens if the user has an external (physical) on-off
switch? Same setup, playing Big Gun as loud as it gets. Apart from the
problems with the opening guitar everything is good, and we see the
weights set as they should be and things are cancelled out.
So, I switch the mic off externally with the switch. Input becomes
practically zero, so the weights readjust to zero as well. Turn the
microphone back on and the echo canceller doesn't adapt.
That is, no echo cancellation, and the weights all stay at their zero
values.
This can happen quite frequently, so it would be nice if the echo
canceller could deal with this situation without a complete reset.
Now, when trying to visualize the weights to see a bit of what was going
on, I also computed the phase for each frequency bin. When looking just at
the phase, I can see a very clear and distinct pattern of going from -pi
to +pi in the areas where I know there is echo (specifically, the lower
7khz of j==M-1), and what looks like random noise for the rest. Do you
have any idea where this pattern originates from, and more importantly,
could it be used as additional conditioning of W? (ie: if the phase
doesn't match the pattern, reduce the amplitude as it's a false match).
More information about the Speex-dev
mailing list