[Speex-dev] mdf -- better adaption of W?

Tue Dec 13 01:32:58 PST 2005

> Err. Ok, as I got it, 'bin 0' has it's amplitude in W[0], bin 1 to N-1 has 
> it's real part in W[i*2-1] and it's imag in W[i*2], and finally the 
> nyquist amplitude is in W[N-1]

Not quite, it's packet "real, real, imag, real, imag, ...".

> I took this from how power_spectrum() computes, so I might be off :)

But power_spectrum() handles that fine, you're right.

> > If you hold that in you're hand, you're probably making it harder than
> > for a real scenario because any movement causes the echo path to change.
> 
> Actually, with maximum volume (which I used to make sure the echo really 
> dominated over the noise), it's quite loud, so I left it in the corner.

That fine then... as long as the max volume doesn't cause too much
distortion (the AEC models only linear effects).

> I'll need to add support for saving audio to my program, so I can give you 
> the "actual" sampled loudspeaker and mic files, and I'll also need to get 
> hold of a test person again. (I had a friend with a friend who has an 
> exceptionally clear voice. My own "aaaaaa" is far too muddy to cause 
> this). I'll try to get this done this week, but it might be delayed 'till 
> after christmas.

Let me know if you have files that cause the problem. Otherwise, it's
pretty much impossible to debug.

> >> This can happen quite frequently, so it would be nice if the echo
> >> canceller could deal with this situation without a complete reset.
> >
> > That can be predicted from the code. It's sort of hard to fix without
> > hurting accuracy for the general case. I'll have to think about it.
> 
> An idea might be to enable the noise cancellation to "feed back" into the 
> echo cancellator. If, after noise cancellation, there's nothing left at 
> all, then stop adapting the echo cancellator.

There's always "something" left after the echo cancellation, if only the
input noise. And even then it wouldn't fix the problem.

> Well, from what I can see in this testcase, it's only "random" where there 
> is no correlation. For example, in the 20ms-40ms timeslot, the amplitude 
> can spike a bit (such as on those "aaaaaa"), but the phase is still 
> random, whereas in the '0-20ms' slot, it's very regular. My thought was to 
> use the "regularity" of the phase shift as an indication for a good match. 

No. All the regularity means is that you have a dominant pulse in the
transfer function. That's expected for the first section of the filter
(because of the direct sound path), but not the others (that are really
just a lot of incoherent stuff).

> So, if arg(W[i+1])-arg(W[i])==arg(W[i])-arg(W[i-1]), we know it's a steady 
> increase, so it's probably a good match. It's quite hackish, and probably 
> not based in any kind of good scientific basis, but it's an idea for 
> dealing better with the specific kind of echo I see here.

What good would it really tell you anyway?

> Then again, it will likely fail horribly if you have 2 echos; one delayed 
> by 5ms with equal amplitude, and another delayed by 15ms with a much lower 
> amplitude. 

That's actually common if you have a wall (or the floor) not too far
from the mic or the speaker.

> I have no idea what the "phase diagram" will look like then.

Messy and dependent on the amplitudes too.

	Jean-Marc