[Speex-dev] Anyone knows how microsoft AEC can deal with mismatches between clocks of capture and render streams?

Fri Apr 15 21:05:29 PDT 2011

On 04/16/2011 01:04 AM, Steve Underwood wrote:
> On 04/14/2011 07:26 PM, LiMaoquan2000 wrote:
>> Hi All,
>> Many Thanks to Underwood for her excellent review of our big trouble
>> which prevent LMS-based AEC algorithms to be used in most computer.
>> Maybe it can be summaried as follows:
>> 1. Different sample rate of sampling and rendering does exists in most
>> low-cost soundcards (In my experiments over more than 20 soundcards,
>> the differences range from 0.5Hz to more than 50Hz when sample rate is
>> set to 8000Hz). Maybe this is totally caused by hardware which can't
>> be solved by software settings.
>> 2. Static measurement of the difference between sample rates is far
>> from enough. Accurater measurement requires more time to record echo
>> signal in order to get accurater frequency shift from spectral
>> structure. However, the accuracy of the measurement is still limited
>> and not enough for long time work of AEC. For example, in my
>> experiment, I recorded 2^18/8000=32 seconds of echo signal, and the
>> freqency resolution is 8000Hz/(2^17)=0.0625Hz. With a precise
>> resampler (sinc interpolation), the speex AEC got much better
>> performance than before. But there are still audible residue echos
>> after AEC. Freqency resolution of 0.0625Hz is still far from enough.
>> There is still delay drift between near-end and far-end voice which is
>> caused by different sample rates even if it is largely eliminated by
>> the resampling. Moreover, the residue difference will cause overflow
>> or underflow of the buffer in a long time, which is a disaster to the
>> echo canceller.
>> Maybe this paper (Pawig, M., Enzner, G., and Vary, P., Adaptive
>> Sampling Rate Correction for Acoustic Echo Control in Voice-over-IP,
>> IEEE Transactions on Signal Processing, Vol. 58, No. 1, January 2010)
>> points out a correct direction. In this paper, the far-end signal is
>> resampled before send to AEC. It estimates the delay drift between
>> acoustic echo and estimated echo and adjusts step of sampling time.
>> When delay drift is zero, far-end signal after resampling will have
>> the same sample rate with the low-end signal. It seems perfect, but it
>> still have some weakness:
>> 1. If relies on a coarse initial convergence of the LMS filter to
>> estimate delay drift between acoustic echo and estimated echo. If
>> there is a big difference, such as 50Hz, no initial convergence can be
>> established.
>> 2. It is too slow to reach the balance. According to its experiment,
>> it will cost about 35-40 seconds to decrease the frequence difference
>> to 0Hz and ERLE will increase only when frequence difference is very
>> close to 0Hz. These results are under the environment without double talk.
>> Have GIPS and Microsoft some secret high efficient method? They AECs
>> converge very quickly, I could hardly hear any echo in the process.
>> How can they do it?
>> Maoquan
>>
> I don't know if this has only recently been put on line, but I never
> noticed it until today -
> www.iwaenc.org/proceedings/*2008*/contents/papers/9044.pdf
>
> That paper is from people at MS describing, in some detail, what the
> Windows kernel echo canceller does to handle synchronisation issues. It
> tracks both time varying sample clock drift and hiccups in the sample
> streams. It seems to handle the drift in a fairly similar manner to the
> several other papers on the topic from the past 10 years.
>
> Steve
>
That URL somegot got a couple of "*" characters in it. Try

http://www.iwaenc.org/proceedings/2008/contents/papers/9044.pdf

If you go to www.iwaenc.org and browse around, they have some interesting stuff on multi-channel EC, getting reverb out of a signal, background noise reduction, and so on.

Steve