[Speex-dev] mdf -- better adaption of W?

Mon Dec 12 20:52:16 PST 2005

>> Generate a test signal (10+x sine waves per frame), where x increases by
>> one for each iteration, and wraps around at 100.
>
> Testing with sine waves is usually not a good idea. If you intend on
> cancelling speech, then test with speech.

Ok, I tested more extensively with both music and two-way speech.  More on 
this below.

>> However, when peeking at the values, it seems that the weights for
>> frame 0 (newest) are very low.
>
> Peeking at the value tells you nothing unless you do the inverse FFT and
> all so you can see them in the time domain. Even then, it's not that
> useful.

Actually, computing the "power spectrum" for each frame of W shows 
how large an ammount of the original signal at time offset j the 
echo canceller thinks should be removed from the current input frame.

If you compute W*X for each j and ifft, you'll get the 
original signal with each frequency component scaled and time-shifted 
according to what W was (for that j).

Anyway, I did some proper testing. I took my headset, bent the microphone 
arm so it's resting inside the .. uh.. whatever you call that large 
muffler thing that goes around your ear. This is an important testcase, as 
a lot of our users have complained about hearing echo that is propagated 
at the remote end either directly though the air from the "speaker" to the 
microphone (common with open headsets), and with closed headsets we see 
echo propagated mechanically down the arm of the microphone.

Playing regular pop music (Garbage: Push It), things work out well, and 
the canceller ends up with semi-stable weights, almost entirely in the 
(j==M-1) bin (0-20ms delay, which is quite natural). It's the same with 
normal speech as long as it's spoken reasonably fast.

I see some "banding" of the output, it seems there's more output signal 
(and more to cancel) in the 1-3khz and 5-6 khz area, but I blame that on 
the headphones; they're cheap.

However, when switching to AC DC: Big Gun, we see and hear a large 
residual echo from the opening el-guitar. This seems to be a result of a 
semi-stable sound that lasts more than 20 ms; the canceller finds a 
correlation in 4-5 timebins instead of just one. We could reproduce the 
same result by playing a human voice saying "aaaaaaaaaa" without variation 
in pitch; the weights for those frequency bins would increase for all the 
timeslots in W.

Now, people don't say "aaaaaaaaaaaaa" all that often, but they do play 
music that has a few "long" sounds, and saying "aaaaanyway" is enough to 
trigger this.

Next test, what happens if the user has an external (physical) on-off 
switch? Same setup, playing Big Gun as loud as it gets. Apart from the 
problems with the opening guitar everything is good, and we see the 
weights set as they should be and things are cancelled out.

So, I switch the mic off externally with the switch. Input becomes 
practically zero, so the weights readjust to zero as well. Turn the 
microphone back on and the echo canceller doesn't adapt.
That is, no echo cancellation, and the weights all stay at their zero 
values.

This can happen quite frequently, so it would be nice if the echo 
canceller could deal with this situation without a complete reset.

Now, when trying to visualize the weights to see a bit of what was going 
on, I also computed the phase for each frequency bin. When looking just at 
the phase, I can see a very clear and distinct pattern of going from -pi 
to +pi in the areas where I know there is echo (specifically, the lower 
7khz of j==M-1), and what looks like random noise for the rest. Do you 
have any idea where this pattern originates from, and more importantly, 
could it be used as additional conditioning of W? (ie: if the phase 
doesn't match the pattern, reduce the amplitude as it's a false match).