[Speex-dev] mdf -- better adaption of W?

Mon Dec 12 22:29:27 PST 2005

>> Actually, computing the "power spectrum" for each frame of W shows
>> how large an ammount of the original signal at time offset j the
>> echo canceller thinks should be removed from the current input frame.
>
> Careful when looking at W because of how the real and imaginary parts
> are packed in the array.

Err. Ok, as I got it, 'bin 0' has it's amplitude in W[0], bin 1 to N-1 has 
it's real part in W[i*2-1] and it's imag in W[i*2], and finally the 
nyquist amplitude is in W[N-1]

I took this from how power_spectrum() computes, so I might be off :)

>> Anyway, I did some proper testing. I took my headset, bent the microphone
>> arm so it's resting inside the .. uh.. whatever you call that large
>> muffler thing that goes around your ear. This is an important testcase, as
>> a lot of our users have complained about hearing echo that is propagated
>> at the remote end either directly though the air from the "speaker" to the
>> microphone (common with open headsets), and with closed headsets we see
>> echo propagated mechanically down the arm of the microphone.
>
> If you hold that in you're hand, you're probably making it harder than
> for a real scenario because any movement causes the echo path to change.

Actually, with maximum volume (which I used to make sure the echo really 
dominated over the noise), it's quite loud, so I left it in the corner.

>> Now, people don't say "aaaaaaaaaaaaa" all that often, but they do play
>> music that has a few "long" sounds, and saying "aaaaanyway" is enough to
>> trigger this.
>
> Can you sent a pair of files so I can run testecho on?

I'll need to add support for saving audio to my program, so I can give you 
the "actual" sampled loudspeaker and mic files, and I'll also need to get 
hold of a test person again. (I had a friend with a friend who has an 
exceptionally clear voice. My own "aaaaaa" is far too muddy to cause 
this). I'll try to get this done this week, but it might be delayed 'till 
after christmas.

>> This can happen quite frequently, so it would be nice if the echo
>> canceller could deal with this situation without a complete reset.
>
> That can be predicted from the code. It's sort of hard to fix without
> hurting accuracy for the general case. I'll have to think about it.

An idea might be to enable the noise cancellation to "feed back" into the 
echo cancellator. If, after noise cancellation, there's nothing left at 
all, then stop adapting the echo cancellator.

>> Now, when trying to visualize the weights to see a bit of what was going
>> on, I also computed the phase for each frequency bin. When looking just at
>> the phase, I can see a very clear and distinct pattern of going from -pi
>> to +pi in the areas where I know there is echo (specifically, the lower
>> 7khz of j==M-1),
>
> What you see is a "linear phase", which is the frequency equivalent of a
> delay in the time domain. So basically, the phase you see is just the
> representation of where the "main impulse" is in the time domain version
> of W (i.e. the time offset between the two signals you sent to the AEC).

Ah, yes. I'm reading up on my DFT now. Amazing how much stuff you can 
forget.

>> and what looks like random noise for the rest. Do you
>> have any idea where this pattern originates from, and more importantly,
>> could it be used as additional conditioning of W? (ie: if the phase
>> doesn't match the pattern, reduce the amplitude as it's a false match).
>
> A random phase is expected. I don't see much usefult info you can get
> from that.

Well, from what I can see in this testcase, it's only "random" where there 
is no correlation. For example, in the 20ms-40ms timeslot, the amplitude 
can spike a bit (such as on those "aaaaaa"), but the phase is still 
random, whereas in the '0-20ms' slot, it's very regular. My thought was to 
use the "regularity" of the phase shift as an indication for a good match. 
So, if arg(W[i+1])-arg(W[i])==arg(W[i])-arg(W[i-1]), we know it's a steady 
increase, so it's probably a good match. It's quite hackish, and probably 
not based in any kind of good scientific basis, but it's an idea for 
dealing better with the specific kind of echo I see here.

Then again, it will likely fail horribly if you have 2 echos; one delayed 
by 5ms with equal amplitude, and another delayed by 15ms with a much lower 
amplitude. I have no idea what the "phase diagram" will look like then.