[Speex-dev] Speex-dev Digest, Vol 83, Issue 10

Fri Apr 15 21:57:55 PDT 2011

Hi Steve,

> I don't know if this has only recently been put on line, but I never
> noticed it until today -
> www.iwaenc.org/proceedings/*2008*/contents/papers/9044.pdf
>
> That paper is from people at MS describing, in some detail, what the
> Windows kernel echo canceller does to handle synchronisation issues. It
> tracks both time varying sample clock drift and hiccups in the sample
> streams. It seems to handle the drift in a fairly similar manner to the
> several other papers on the topic from the past 10 years.

It (Challenges and Solutions for Designing Software AEC on Personal Computers)
is a good paper. But some critical details were omitted. Let me have a summary.

1. The paper points out two phenomena which break the synchronization/alignment
of far-end signal and near-end signal. They are glitch (caused by loss of
samples) and clock drifting (caused by different clock generators).

2. It introduces a concept of Relative Sample Offset (RSO, d[i]). It indicates
the time drifting of relevant samples in far-end signal and near-end signal.
If no glitch and clock drifting, RSO (d[i]) will be constant. Glitch will make
it change suddenly whereas clock drifting will change it slowly and constantly.
So we can get the glitchs and clock drifting by monitoring the change of RSO.

BUT:
1. The paper gave some formulas to estimate clock drifting and glitchs in
chapter 3 from d[i] (RSO). But it did not told us how to get d[i].

2. It said there is another case which interferes the alignment. Who knows
what it is talking about?

"(3) Noisy timing measurements: Modern audio hardware provides timing data in order to
synchronize m[i] and s[i]. The information is always noisy, due to limited numerical
precision, data transfer delay, multithreading, etc."
m[i] is microphone signal after ADC, s[i] is speaker signal before DAC.

Maoquan

>
> Steve
>
> On 04/14/2011 07:26 PM, LiMaoquan2000 wrote:
> > Hi All,
> > Many Thanks to Underwood for her excellent review of our big trouble
> > which prevent LMS-based AEC algorithms to be used in most computer.
> > Maybe it can be summaried as follows:
> > 1. Different sample rate of sampling and rendering does exists in most
> > low-cost soundcards (In my experiments over more than 20 soundcards,
> > the differences range from 0.5Hz to more than 50Hz when sample rate is
> > set to 8000Hz). Maybe this is totally caused by hardware which can't
> > be solved by software settings.
> > 2. Static measurement of the difference between sample rates is far
> > from enough. Accurater measurement requires more time to record echo
> > signal in order to get accurater frequency shift from spectral
> > structure. However, the accuracy of the measurement is still limited
> > and not enough for long time work of AEC. For example, in my
> > experiment, I recorded 2^18/8000=32 seconds of echo signal, and the
> > freqency resolution is 8000Hz/(2^17)=0.0625Hz. With a precise
> > resampler (sinc interpolation), the speex AEC got much better
> > performance than before. But there are still audible residue echos
> > after AEC. Freqency resolution of 0.0625Hz is still far from enough.
> > There is still delay drift between near-end and far-end voice which is
> > caused by different sample rates even if it is largely eliminated by
> > the resampling. Moreover, the residue difference will cause overflow
> > or underflow of the buffer in a long time, which is a disaster to the
> > echo canceller.
> > Maybe this paper (Pawig, M., Enzner, G., and Vary, P., Adaptive
> > Sampling Rate Correction for Acoustic Echo Control in Voice-over-IP,
> > IEEE Transactions on Signal Processing, Vol. 58, No. 1, January 2010)
> > points out a correct direction. In this paper, the far-end signal is
> > resampled before send to AEC. It estimates the delay drift between
> > acoustic echo and estimated echo and adjusts step of sampling time.
> > When delay drift is zero, far-end signal after resampling will have
> > the same sample rate with the low-end signal. It seems perfect, but it
> > still have some weakness:
> > 1. If relies on a coarse initial convergence of the LMS filter to
> > estimate delay drift between acoustic echo and estimated echo. If
> > there is a big difference, such as 50Hz, no initial convergence can be
> > established.
> > 2. It is too slow to reach the balance. According to its experiment,
> > it will cost about 35-40 seconds to decrease the frequence difference
> > to 0Hz and ERLE will increase only when frequence difference is very
> > close to 0Hz. These results are under the environment without double talk.
> > Have GIPS and Microsoft some secret high efficient method? They AECs
> > converge very quickly, I could hardly hear any echo in the process.
> > How can they do it?
> > Maoquan
> >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.xiph.org/pipermail/speex-dev/attachments/20110416/7a0b6f45/attachment.htm