[Speex-dev] Anyone knows how microsoft AEC can deal with mismatches between clocks of capture and render streams?

Fri Apr 15 10:04:51 PDT 2011

On 04/14/2011 07:26 PM, LiMaoquan2000 wrote:
> Hi All,
> Many Thanks to Underwood for her excellent review of our big trouble 
> which prevent LMS-based AEC algorithms to be used in most computer. 
> Maybe it can be summaried as follows:
> 1. Different sample rate of sampling and rendering does exists in most 
> low-cost soundcards (In my experiments over more than 20 soundcards, 
> the differences range from 0.5Hz to more than 50Hz when sample rate is 
> set to 8000Hz). Maybe this is totally caused by hardware which can't 
> be solved by software settings.
> 2. Static measurement of the difference between sample rates is far 
> from enough. Accurater measurement requires more time to record echo 
> signal in order to get accurater frequency shift from spectral 
> structure. However, the accuracy of the measurement is still limited 
> and not enough for long time work of AEC. For example, in my 
> experiment, I recorded 2^18/8000=32 seconds of echo signal, and the 
> freqency resolution is 8000Hz/(2^17)=0.0625Hz. With a precise 
> resampler (sinc interpolation), the speex AEC got much better 
> performance than before. But there are still audible residue echos 
> after AEC. Freqency resolution of 0.0625Hz is still far from enough. 
> There is still delay drift between near-end and far-end voice which is 
> caused by different sample rates even if it is largely eliminated by 
> the resampling. Moreover, the residue difference will cause overflow 
> or underflow of the buffer in a long time, which is a disaster to the 
> echo canceller.
> Maybe this paper (Pawig, M., Enzner, G., and Vary, P., Adaptive 
> Sampling Rate Correction for Acoustic Echo Control in Voice-over-IP, 
> IEEE Transactions on Signal Processing, Vol. 58, No. 1, January 2010) 
> points out a correct direction. In this paper, the far-end signal is 
> resampled before send to AEC. It estimates the delay drift between 
> acoustic echo and estimated echo and adjusts step of sampling time. 
> When delay drift is zero, far-end signal after resampling will have 
> the same sample rate with the low-end signal. It seems perfect, but it 
> still have some weakness:
> 1. If relies on a coarse initial convergence of the LMS filter to 
> estimate delay drift between acoustic echo and estimated echo. If 
> there is a big difference, such as 50Hz, no initial convergence can be 
> established.
> 2. It is too slow to reach the balance. According to its experiment, 
> it will cost about 35-40 seconds to decrease the frequence difference 
> to 0Hz and ERLE will increase only when frequence difference is very 
> close to 0Hz. These results are under the environment without double talk.
> Have GIPS and Microsoft some secret high efficient method? They AECs 
> converge very quickly, I could hardly hear any echo in the process. 
> How can they do it?
> Maoquan
>
I don't know if this has only recently been put on line, but I never 
noticed it until today - 
www.iwaenc.org/proceedings/*2008*/contents/papers/9044.pdf

That paper is from people at MS describing, in some detail, what the 
Windows kernel echo canceller does to handle synchronisation issues. It 
tracks both time varying sample clock drift and hiccups in the sample 
streams. It seems to handle the drift in a fairly similar manner to the 
several other papers on the topic from the past 10 years.

Steve