[Speex-dev] mdf -- better adaption of W?
jean-marc.valin at usherbrooke.ca
Mon Dec 12 21:32:45 PST 2005
> Actually, computing the "power spectrum" for each frame of W shows
> how large an ammount of the original signal at time offset j the
> echo canceller thinks should be removed from the current input frame.
Careful when looking at W because of how the real and imaginary parts
are packed in the array.
> If you compute W*X for each j and ifft, you'll get the
> original signal with each frequency component scaled and time-shifted
> according to what W was (for that j).
Yes, that's the Y/y signal in the code.
> Anyway, I did some proper testing. I took my headset, bent the microphone
> arm so it's resting inside the .. uh.. whatever you call that large
> muffler thing that goes around your ear. This is an important testcase, as
> a lot of our users have complained about hearing echo that is propagated
> at the remote end either directly though the air from the "speaker" to the
> microphone (common with open headsets), and with closed headsets we see
> echo propagated mechanically down the arm of the microphone.
If you hold that in you're hand, you're probably making it harder than
for a real scenario because any movement causes the echo path to change.
> Playing regular pop music (Garbage: Push It), things work out well, and
> the canceller ends up with semi-stable weights, almost entirely in the
> (j==M-1) bin (0-20ms delay, which is quite natural). It's the same with
> normal speech as long as it's spoken reasonably fast.
> I see some "banding" of the output, it seems there's more output signal
> (and more to cancel) in the 1-3khz and 5-6 khz area, but I blame that on
> the headphones; they're cheap.
Not sure what you mean but it doesn't seem to be a problem.
> However, when switching to AC DC: Big Gun, we see and hear a large
> residual echo from the opening el-guitar. This seems to be a result of a
> semi-stable sound that lasts more than 20 ms; the canceller finds a
> correlation in 4-5 timebins instead of just one. We could reproduce the
> same result by playing a human voice saying "aaaaaaaaaa" without variation
> in pitch; the weights for those frequency bins would increase for all the
> timeslots in W.
> Now, people don't say "aaaaaaaaaaaaa" all that often, but they do play
> music that has a few "long" sounds, and saying "aaaaanyway" is enough to
> trigger this.
Can you sent a pair of files so I can run testecho on?
> Next test, what happens if the user has an external (physical) on-off
> switch? Same setup, playing Big Gun as loud as it gets. Apart from the
> problems with the opening guitar everything is good, and we see the
> weights set as they should be and things are cancelled out.
> So, I switch the mic off externally with the switch. Input becomes
> practically zero, so the weights readjust to zero as well. Turn the
> microphone back on and the echo canceller doesn't adapt.
> That is, no echo cancellation, and the weights all stay at their zero
> This can happen quite frequently, so it would be nice if the echo
> canceller could deal with this situation without a complete reset.
That can be predicted from the code. It's sort of hard to fix without
hurting accuracy for the general case. I'll have to think about it.
> Now, when trying to visualize the weights to see a bit of what was going
> on, I also computed the phase for each frequency bin. When looking just at
> the phase, I can see a very clear and distinct pattern of going from -pi
> to +pi in the areas where I know there is echo (specifically, the lower
> 7khz of j==M-1),
What you see is a "linear phase", which is the frequency equivalent of a
delay in the time domain. So basically, the phase you see is just the
representation of where the "main impulse" is in the time domain version
of W (i.e. the time offset between the two signals you sent to the AEC).
> and what looks like random noise for the rest. Do you
> have any idea where this pattern originates from, and more importantly,
> could it be used as additional conditioning of W? (ie: if the phase
> doesn't match the pattern, reduce the amplitude as it's a false match).
A random phase is expected. I don't see much usefult info you can get
More information about the Speex-dev