[Speex-dev] mdf -- better adaption of W?
Jean-Marc.Valin at USherbrooke.ca
Tue Dec 13 01:32:58 PST 2005
> Err. Ok, as I got it, 'bin 0' has it's amplitude in W, bin 1 to N-1 has
> it's real part in W[i*2-1] and it's imag in W[i*2], and finally the
> nyquist amplitude is in W[N-1]
Not quite, it's packet "real, real, imag, real, imag, ...".
> I took this from how power_spectrum() computes, so I might be off :)
But power_spectrum() handles that fine, you're right.
> > If you hold that in you're hand, you're probably making it harder than
> > for a real scenario because any movement causes the echo path to change.
> Actually, with maximum volume (which I used to make sure the echo really
> dominated over the noise), it's quite loud, so I left it in the corner.
That fine then... as long as the max volume doesn't cause too much
distortion (the AEC models only linear effects).
> I'll need to add support for saving audio to my program, so I can give you
> the "actual" sampled loudspeaker and mic files, and I'll also need to get
> hold of a test person again. (I had a friend with a friend who has an
> exceptionally clear voice. My own "aaaaaa" is far too muddy to cause
> this). I'll try to get this done this week, but it might be delayed 'till
> after christmas.
Let me know if you have files that cause the problem. Otherwise, it's
pretty much impossible to debug.
> >> This can happen quite frequently, so it would be nice if the echo
> >> canceller could deal with this situation without a complete reset.
> > That can be predicted from the code. It's sort of hard to fix without
> > hurting accuracy for the general case. I'll have to think about it.
> An idea might be to enable the noise cancellation to "feed back" into the
> echo cancellator. If, after noise cancellation, there's nothing left at
> all, then stop adapting the echo cancellator.
There's always "something" left after the echo cancellation, if only the
input noise. And even then it wouldn't fix the problem.
> Well, from what I can see in this testcase, it's only "random" where there
> is no correlation. For example, in the 20ms-40ms timeslot, the amplitude
> can spike a bit (such as on those "aaaaaa"), but the phase is still
> random, whereas in the '0-20ms' slot, it's very regular. My thought was to
> use the "regularity" of the phase shift as an indication for a good match.
No. All the regularity means is that you have a dominant pulse in the
transfer function. That's expected for the first section of the filter
(because of the direct sound path), but not the others (that are really
just a lot of incoherent stuff).
> So, if arg(W[i+1])-arg(W[i])==arg(W[i])-arg(W[i-1]), we know it's a steady
> increase, so it's probably a good match. It's quite hackish, and probably
> not based in any kind of good scientific basis, but it's an idea for
> dealing better with the specific kind of echo I see here.
What good would it really tell you anyway?
> Then again, it will likely fail horribly if you have 2 echos; one delayed
> by 5ms with equal amplitude, and another delayed by 15ms with a much lower
That's actually common if you have a wall (or the floor) not too far
from the mic or the speaker.
> I have no idea what the "phase diagram" will look like then.
Messy and dependent on the amplitudes too.
More information about the Speex-dev