<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">


<HTML><HEAD>


<STYLE type=text/css> <!--@import url(E:\LMQ\LightingMail\\data\scrollbar.css); -->p{margin:0px;padding:0px;}; </STYLE>


<META content="text/html; charset=utf-8" http-equiv=Content-Type>


<STYLE>BLOCKQUOTE{margin-Top: 0px; margin-Bottom: 0px; margin-Left: 2em}; </STYLE>


<META name=GENERATOR content="MSHTML 9.00.8112.16421"><BASE target=_blank><BASE 


target=_blank></HEAD>


<BODY 


style="BORDER-RIGHT-WIDTH: 0px; MARGIN: 12px; BORDER-TOP-WIDTH: 0px; BORDER-BOTTOM-WIDTH: 0px; BORDER-LEFT-WIDTH: 0px" 


topMargin=10 marginwidth="0" marginheight="0"><STATIONERY>


<DIV>


<DIV>Hi All,</DIV>


<DIV>&nbsp;</DIV>


<DIV>Many Thanks to Underwood for her excellent review of our big trouble which 


prevent LMS-based AEC algorithms to be used in most computer. Maybe it can be 


summaried as follows:</DIV>


<DIV>&nbsp;</DIV>


<DIV>1. Different sample rate of sampling and rendering does exists in most 


low-cost soundcards (In my experiments over more than 20 soundcards, the 


differences range from 0.5Hz to more than 50Hz when sample rate is set to 


8000Hz). Maybe this is totally caused by hardware which can't be solved by 


software settings.</DIV>


<DIV>&nbsp;</DIV>


<DIV>2. Static measurement of the difference between sample rates is far from 


enough. Accurater measurement requires more time to record echo signal in order 


to get accurater frequency shift from spectral structure. However, the accuracy 


of the measurement is still limited and not enough for long time work of AEC. 


For example, in my experiment, I recorded 2^18/8000=32 seconds of echo signal, 


and the freqency resolution is 8000Hz/(2^17)=0.0625Hz. With a precise resampler 


(sinc interpolation), the speex AEC got much better performance than before. But 


there are still audible residue echos after AEC. Freqency resolution of 0.0625Hz 


is still far from enough. There is still delay drift between near-end and 


far-end voice which is caused by different sample rates even if it is largely 


eliminated by the resampling. Moreover, the residue difference will cause 


overflow or underflow of the buffer in a long time, which is a disaster to the 


echo canceller.</DIV>


<DIV>&nbsp;</DIV>


<DIV>Maybe this paper (Pawig, M., Enzner, G., and Vary, P., Adaptive Sampling 


Rate Correction for Acoustic Echo Control in Voice-over-IP, IEEE Transactions on 


Signal Processing, Vol. 58, No. 1, January 2010) points out a correct direction. 


In this paper, the far-end signal is resampled before send to AEC. It estimates 


the delay drift between acoustic echo and estimated echo and adjusts step of 


sampling time. When delay drift is zero, far-end signal after resampling will 


have the same sample rate with the low-end signal. It seems perfect, but it 


still have some weakness:</DIV>


<DIV>&nbsp;</DIV>


<DIV>1. If relies on a coarse initial convergence of the LMS filter to estimate 


delay drift between acoustic echo and estimated echo. If there is a big 


difference, such as 50Hz, no initial convergence can be established.</DIV>


<DIV>2. It is too slow to reach the balance. According to its experiment, it 


will cost about 35-40 seconds to decrease the frequence difference to 0Hz and 


ERLE will increase only when frequence difference is very close to 0Hz. These 


results are under the environment without double talk.</DIV>


<DIV>&nbsp;</DIV>


<DIV>Have GIPS and Microsoft some secret high efficient method? They AECs 


converge very quickly, I could hardly hear any echo in the process. How can they 


do it?</DIV>


<DIV>&nbsp;</DIV>


<DIV>Maoquan</DIV>


<DIV>&nbsp;</DIV>


<DIV>&gt;<BR>&gt; 


---------------------------------------------------------------------- <BR>&gt; 


<BR>&gt; On 04/13/2011 02:58 AM, Shridhar, Vasant wrote: <BR>&gt;&nbsp; I am 


doing this right now with no problem.&nbsp; I am not using speex for this at the 


moment though.&nbsp; Group delay is the biggest problem.&nbsp; I implemented a 


version where the input and output sample rates are known up front.&nbsp; The 


routine than interpolates between the jitter.&nbsp; This should solve the 


problem.&nbsp; The crystals used to clock the input and output have very fine 


tolerances on most standard audio cards. <BR>&gt; <BR>&gt; Do you mean the group 


delay of your interpolation filter? I don't see <BR>&gt; why that is an issue. 


At the echo cancellation point it just looks like <BR>&gt; a bit more echo 


delay. I also don't know why you use the word jitter in <BR>&gt; relation to 


interpolation. The jitter you have is in the reception time <BR>&gt; of blocks 


of samples, which makes the assessment of sampling rates hard, <BR>&gt; but 


doesn't affect the actual interpolation. <BR>&gt; <BR>&gt; We are talking about 


two clocks, which are not synchronised, and which <BR>&gt; may drift in 


frequency significantly over fairly short periods of time. <BR>&gt; The issue is 


accurately assessing the sampling rate difference, to phase <BR>&gt; locked 


levels of accuracy, so the resampling is precise. You can find <BR>&gt; sampling 


rates like 8000/s and 8100/s, which is a disaster for most echo <BR>&gt; 


cancellers. If the clock rate difference is assessed to 0.1Hz accuracy, <BR>&gt; 


and the 8100/s sampled signal is resampled to 8000.1/s, you would still <BR>&gt; 


need to totally readapt the canceller every 10s, including periods of <BR>&gt; 


double talk. That is too fast for the canceller to ever be working well. 


<BR>&gt; You really need a very accurate assessment of the sampling rate 


<BR>&gt; difference, so you can essentially eliminate all difference between the 


<BR>&gt; two rates. <BR>&gt; <BR>&gt; Assessing the sampling rate difference 


accurately is not hard, if you <BR>&gt; have plenty of time. Doing it in a 


shorter period is where the challenge <BR>&gt; lies. You are decoupled from a 


precise real time view of the sampling <BR>&gt; process. All you can base your 


sampling rate assessment on is long term <BR>&gt; assessments of sample rates, 


or an analysis of how the echo is drifting <BR>&gt; through the samples. From 


the last 10 years you will find a number of <BR>&gt; papers published in IEEE 


and other journals about this problem, as it <BR>&gt; pertains to echo 


cancelling in conferencing, and other distributed <BR>&gt; setups. In these 


systems, synchronisation of various echo laden signals <BR>&gt; is impractical. 


All the papers I've seen come down to doing basically <BR>&gt; the same thing - 


resampling based on a best assessment of echo drift <BR>&gt; rates. It seems 


like its still a research topic, and it seems like <BR>&gt; existing solutions 


have their problems. Fraunhofer have recently <BR>&gt; released a conferencing 


echo handler with a vague description of how it <BR>&gt; works, but a clear 


indication that it isn't even trying to cancel the <BR>&gt; echo. It is juggling 


gains, and performing other tricks, to make the <BR>&gt; echo perceptually 


tolerable - an approach which has historically worked <BR>&gt; pretty well (e.g. 


the DSP Group solution from the 90s). At least one <BR>&gt; person reported, on 


this list, that their solution is the best around. <BR>&gt; &gt; Vas <BR>&gt; 


&gt; ________________________________________ <BR>&gt; &gt; From: Li Maoquan 


[limaoquan2000@126.com] <BR>&gt; &gt; Sent: Tuesday, April 12, 2011 2:48 PM 


<BR>&gt; &gt; To: Shridhar, Vasant <BR>&gt; &gt; Cc: speex-dev <BR>&gt; &gt; 


Subject: Re:RE: [Speex-dev] Anyone knows how microsoft AEC can deal with 


mismatches&nbsp;&nbsp;&nbsp;&nbsp; between clocks of capture and render streams? 


<BR>&gt; &gt; <BR>&gt; &gt; Hi Shridhar, <BR>&gt; &gt; <BR>&gt; &gt; Sample rate 


conversion is not enough to solve this problem. I have tried this method several 


months <BR>&gt; &gt; ago. The first step is to measure the difference between 


sample rate of capturing and rendering. Then <BR>&gt; &gt; resampling (by what 


you said "sinc interpolation") one signal to eliminate the difference. The 


frequency <BR>&gt; &gt; step in my experiment is less than 0.1Hz. I have tried 


speex AEC after resampling, much more echo is <BR>&gt; &gt; cancelled than the 


one without resampling. But there is still echo can be heared. <BR>&gt; &gt; 


After all, frequency step of sample rate conversion is limited, mismatch is 


still exist after resampling. <BR>&gt; &gt; Someone told me that capture and 


render codec have different clock generator which shift independently. <BR>&gt; 


&gt; And LMS algorithm is very sensitive to the difference between sample rates. 


<BR>&gt; &gt; <BR>&gt; &gt; Sincerely <BR>&gt; &gt; Maoquan <BR>&gt; &gt; 


<BR>&gt; &gt; At 2011-04-12 21:46:26?"Shridhar, Vasant" &lt;<A 


href="mailto:vasant.shridhar@harman.com">vasant.shridhar@harman.com</A>&gt; 


wrote: <BR>&gt; &gt; I would imagine that it is handle through basic 


asynchronous sample rate conversion.&nbsp; There is a lot of literature out 


there on the different techniques to do this.&nbsp; A common method is sinc 


interpolation.&nbsp; This is how I have handle these types of things in the 


past. <BR>&gt; &gt; <BR>&gt; &gt; Vasant Shridhar <BR>&gt; &gt; <BR>&gt; &gt; 


From: <A 


href="mailto:speex-dev-bounces@xiph.org<mailto:speex-dev-bounces@xiph.org">speex-dev-bounces@xiph.org&lt;mailto:speex-dev-bounces@xiph.org</A>&gt; 


[mailto:speex-dev-bounces@xiph.org&lt;<A 


href="mailto:speex-dev-bounces@xiph.org">mailto:speex-dev-bounces@xiph.org</A>&gt;] 


On Behalf Of LiMaoquan2000 <BR>&gt; &gt; Sent: Tuesday, April 12, 2011 12:36 AM 


<BR>&gt; &gt; To: speex-dev <BR>&gt; &gt; Subject: [Speex-dev] Anyone knows how 


microsoft AEC can deal with mismatches between clocks of capture and render 


streams? <BR>&gt; &gt; <BR>&gt; &gt; <BR>&gt; &gt; Hi all, <BR>&gt; &gt; 


<BR>&gt; &gt; We all know that mismatch between clocks of ADCs of far-end voice 


and near-end voice is not allowed in a time-domain or frequency-domain LMS based 


AEC system. It means that capture and render audio streams must be synchronized 


to a same sample rate. However, I found that this restriction is removed in 


microsoft AEC from Windows XP SP1. Anyone knows how microsoft AEC do it? This 


technology is much helpful for us to implement AEC in common PC. We know that 


most low-cost soundcards have different sample rates in capturing and rendering 


which prevents LMS based AEC from being used in most computer. <BR>&gt; &gt; 


<BR>&gt; &gt; <A 


href="http://msdn.microsoft.com/en-us/library/ff536174(VS.85).aspx<http://msdn.microsoft.com/en-us/library/ff536174%28VS.85%29.aspx">http://msdn.microsoft.com/en-us/library/ff536174(VS.85).aspx&lt;http://msdn.microsoft.com/en-us/library/ff536174%28VS.85%29.aspx</A>&gt; 


<BR>&gt; &gt; In Windows XP, the clock rate must be matched between the capture 


and render streams. The AEC system filter implements no mechanism for matching 


sample rates across devices. ............. In Windows XP SP1, Windows Server 


2003, and later, this limitation does not exist. The AEC system filter correctly 


handles mismatches between the clocks for the capture and render streams, and 


separate devices can be used for capture and rendering. <BR>&gt; &gt; <BR>&gt; 


You have posted the same thing before, but ignored replies because you <BR>&gt; 


didn't like them. The paragraph you quoted can be taken as a clear <BR>&gt; 


statement that MS precisely resample the signals. However, if you read <BR>&gt; 


the whole page it is less clear. The key thing that paragraph is talking 


<BR>&gt; about is big sampling rate changes - like taking a 48k/s signal and a 


<BR>&gt; 16k/s signal, and resampling the 48k/s one to 16k/s, so cancellation 


can <BR>&gt; work. That is the thing which seems to have been added in XP SP1. 


The <BR>&gt; paragraph seems to imply that fine resampling happens, but if you 


read <BR>&gt; the rest of the page it comes from, things are not so clear. There 


are <BR>&gt; many vague and unclear things on that page. If they had brilliantly 


<BR>&gt; solved this problem, everyone should be relying on the MS canceller for 


<BR>&gt; their Windows solutions, but that doesn't seem to be the case. It seems 


<BR>&gt; many soft-phones rely on their own echo handling solutions, and many do 


<BR>&gt; not handle echo very well. <BR>&gt; <BR>&gt; Steve 


<BR></DIV></DIV></STATIONERY></BODY></HTML>