[Speex-dev] mdf -- better adaption of W?

Mon Dec 12 21:32:45 PST 2005

> Actually, computing the "power spectrum" for each frame of W shows 
> how large an ammount of the original signal at time offset j the 
> echo canceller thinks should be removed from the current input frame.

Careful when looking at W because of how the real and imaginary parts
are packed in the array.

> If you compute W*X for each j and ifft, you'll get the 
> original signal with each frequency component scaled and time-shifted 
> according to what W was (for that j).

Yes, that's the Y/y signal in the code.

> Anyway, I did some proper testing. I took my headset, bent the microphone 
> arm so it's resting inside the .. uh.. whatever you call that large 
> muffler thing that goes around your ear. This is an important testcase, as 
> a lot of our users have complained about hearing echo that is propagated 
> at the remote end either directly though the air from the "speaker" to the 
> microphone (common with open headsets), and with closed headsets we see 
> echo propagated mechanically down the arm of the microphone.

If you hold that in you're hand, you're probably making it harder than
for a real scenario because any movement causes the echo path to change.

> Playing regular pop music (Garbage: Push It), things work out well, and 
> the canceller ends up with semi-stable weights, almost entirely in the 
> (j==M-1) bin (0-20ms delay, which is quite natural). It's the same with 
> normal speech as long as it's spoken reasonably fast.

Fine.

> I see some "banding" of the output, it seems there's more output signal 
> (and more to cancel) in the 1-3khz and 5-6 khz area, but I blame that on 
> the headphones; they're cheap.

Not sure what you mean but it doesn't seem to be a problem.

> However, when switching to AC DC: Big Gun, we see and hear a large 
> residual echo from the opening el-guitar. This seems to be a result of a 
> semi-stable sound that lasts more than 20 ms; the canceller finds a 
> correlation in 4-5 timebins instead of just one. We could reproduce the 
> same result by playing a human voice saying "aaaaaaaaaa" without variation 
> in pitch; the weights for those frequency bins would increase for all the 
> timeslots in W.
> 
> Now, people don't say "aaaaaaaaaaaaa" all that often, but they do play 
> music that has a few "long" sounds, and saying "aaaaanyway" is enough to 
> trigger this.

Can you sent a pair of files so I can run testecho on?

> Next test, what happens if the user has an external (physical) on-off 
> switch? Same setup, playing Big Gun as loud as it gets. Apart from the 
> problems with the opening guitar everything is good, and we see the 
> weights set as they should be and things are cancelled out.
> 
> So, I switch the mic off externally with the switch. Input becomes 
> practically zero, so the weights readjust to zero as well. Turn the 
> microphone back on and the echo canceller doesn't adapt.
> That is, no echo cancellation, and the weights all stay at their zero 
> values.
> 
> This can happen quite frequently, so it would be nice if the echo 
> canceller could deal with this situation without a complete reset.

That can be predicted from the code. It's sort of hard to fix without
hurting accuracy for the general case. I'll have to think about it.

> Now, when trying to visualize the weights to see a bit of what was going 
> on, I also computed the phase for each frequency bin. When looking just at 
> the phase, I can see a very clear and distinct pattern of going from -pi 
> to +pi in the areas where I know there is echo (specifically, the lower 
> 7khz of j==M-1), 

What you see is a "linear phase", which is the frequency equivalent of a
delay in the time domain. So basically, the phase you see is just the
representation of where the "main impulse" is in the time domain version
of W (i.e. the time offset between the two signals you sent to the AEC).

> and what looks like random noise for the rest. Do you 
> have any idea where this pattern originates from, and more importantly, 
> could it be used as additional conditioning of W? (ie: if the phase 
> doesn't match the pattern, reduce the amplitude as it's a false match).

A random phase is expected. I don't see much usefult info you can get
from that.

	Jean-Marc