<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:st1="urn:schemas-microsoft-com:office:smarttags" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv=Content-Type content="text/html; charset=gb2312">
<meta name=Generator content="Microsoft Word 11 (filtered medium)">
<!--[if !mso]>
<style>
v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style>
<![endif]--><o:SmartTagType
namespaceuri="urn:schemas-microsoft-com:office:smarttags" name="PersonName"/>
<!--[if !mso]>
<style>
st1\:*{behavior:url(#default#ieooui) }
</style>
<![endif]-->
<style>
<!--
/* Font Definitions */
@font-face
        {font-family:SimSun;
        panose-1:2 1 6 0 3 1 1 1 1 1;}
@font-face
        {font-family:Tahoma;
        panose-1:2 11 6 4 3 5 4 4 2 4;}
@font-face
        {font-family:"\@SimSun";
        panose-1:2 1 6 0 3 1 1 1 1 1;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0in;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:SimSun;}
a:link, span.MsoHyperlink
        {color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {color:blue;
        text-decoration:underline;}
pre
        {margin:0in;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:SimSun;}
span.EmailStyle18
        {mso-style-type:personal-reply;
        font-family:Arial;
        color:navy;}
@page Section1
        {size:8.5in 11.0in;
        margin:1.0in 1.25in 1.0in 1.25in;}
div.Section1
        {page:Section1;}
-->
</style>
</head>
<body lang=EN-US link=blue vlink=blue>
<div class=Section1>
<p class=MsoNormal><font size=2 color=navy face=Arial><span style='font-size:
10.0pt;font-family:Arial;color:navy'>Assumption of a quiet room is inapplicable
in reality. Plus, the impulse response might change in the middle of a call … so,
you basically need to find a good initial alignment point, then, track it along
the way. Make it reliable is very much the key.<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 color=navy face=Arial><span style='font-size:
10.0pt;font-family:Arial;color:navy'><o:p> </o:p></span></font></p>
<div>
<div class=MsoNormal align=center style='text-align:center'><font size=3
face=SimSun><span style='font-size:12.0pt'>
<hr size=2 width="100%" align=center tabindex=-1>
</span></font></div>
<p class=MsoNormal><b><font size=2 face=Tahoma><span style='font-size:10.0pt;
font-family:Tahoma;font-weight:bold'>From:</span></font></b><font size=2
face=Tahoma><span style='font-size:10.0pt;font-family:Tahoma'>
speex-dev-bounces@xiph.org [mailto:speex-dev-bounces@xiph.org] <b><span
style='font-weight:bold'>On Behalf Of </span></b>Li Maoquan<br>
<b><span style='font-weight:bold'>Sent:</span></b> Wednesday, April 20, 2011
5:36 PM<br>
<b><span style='font-weight:bold'>To:</span></b> speex-dev@xiph.org<br>
<b><span style='font-weight:bold'>Subject:</span></b> Re: [Speex-dev] Acoustic
echo cancellation</span></font><o:p></o:p></p>
</div>
<p class=MsoNormal><font size=3 face=SimSun><span style='font-size:12.0pt'><o:p> </o:p></span></font></p>
<p class=MsoNormal><font size=3 face=SimSun><span style='font-size:12.0pt'>Simply
to say, in a quiet room, you can play a impulse signal and then find it's
impulse response signal from the <br>
microphone. For example, if the delay between the impulse signal and its
response signal range from 500 to<br>
3000 cycles, you can buffer the far-end signal to 0-300 cycles and set the
filter length to 4000. It is also called<br>
to align far-end signal and near-end signal.<br>
<br>
BTW: Speex AEC is sensiive to mismatch between sample rates of capturing and
rendering. But most low-cost<br>
computer soundcards have this problem.<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=3 face=SimSun><span style='font-size:12.0pt'><o:p> </o:p></span></font></p>
<pre><font size=3 face=SimSun><span style='font-size:12.0pt'><br>
At 2011-04-21 03:00:01<span lang=ZH-CN>,</span><st1:PersonName w:st="on">speex-dev-request@xiph.org</st1:PersonName> wrote:<o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>> >>><o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>> >>> I have a scenario in a mobile VoIP app that requires echo cancellation<o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>> but<o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>> >>> is somewhat different from what's described in the docs.<o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>> >>><o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>> >>> Audio is received from and sent to the network at 8000Hz. Each packet<o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>> >>> contains 160 samples worth a playback of 20ms.<o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>> >>><o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>> >>> But the hardware requires aggregation for both playback and capture. So<o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>> for<o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>> >>> playback, I coalesce 4 packets in a buffer and queue them as a larger<o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>> buffer<o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>> >>> for playback.<o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>> >>> On the send side, I read a large buffer (worth 4 packets) and send them<o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>> out<o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>> >>> over time 20ms apart.<o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>> >>><o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>> >>> I tried using speex_echo_playback just when a 160-sample packet arrives<o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>> from<o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>> >>> the network, before coalescing and speex_echo_capture just before a<o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>> packet<o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>> >>> is sent out to the network but that doesn't seem to work properly<o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>> (doesn't<o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>> >>> cancel any echo).<o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>> >><o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>> >> The most likely reason is that you didn't align the far-end and near-end<o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>> samples.<o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>> >> So the filter can not converge.<o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>> ><o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>> >Thanks for your response. Can you please explain what you mean by<o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>> >align samples from near-end and far-end? And how is that usually<o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>> >accomplished?<o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>><o:p> </o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>> You need to know the total delay caused by DAC buffer before speaker, ADC<o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>> buffer<o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>> after microphone and acoustic path between speaker and microphone. Simply<o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>> to say,<o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>> if you play an impluse signal and its first echo appears after N sample<o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>> cycles,<o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>> you can call N as the delay between y (echo in near-end signal) and x<o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>> (far-end<o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>> signal). Then you can buffer far-end signal for N-M cycles before sending<o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>> to AEC.<o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>> M is a little number (such as 100) in order to avoid filter failure when<o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>> echo<o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>> path drifts.<o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>><o:p> </o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>><o:p> </o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>Thanks again. I am trying to model the delay between the near and far end<o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>signals using a circular queue of length n. Every time a frame is received<o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>and queued for playback, it is also entered into the queue. Each frame being<o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>read from the mic is echo-cancelled ( speex_echo_cancellation ) using the<o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>oldest frame in the queue if the queue is filled up, thus I am cancelling<o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>the recorded frame using a playback frame that is N-frames old.<o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>><o:p> </o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>I have played with different values of N from 2 to 50 (320 samples to 8000<o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>samples), attempting to align the input and output but the cancellation<o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>doesn't seem to work. The echo is steady as ever.<o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>><o:p> </o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>Is this model correct and expected to converge with a right value of "N"? Or<o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>do I need some other adaptation to account for drifts here. Right now, it's<o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>a black box for me. I am not sure how to get some feedback from this system<o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>to tune the AEC (and the delay parameters) correctly.<o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>><o:p> </o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>Also, I did not follow the use of "M" in your description above and how it<o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>helps with drifts. My queue stores frames (160 samples each). So a number of<o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>100 samples seems too small.<o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>><o:p> </o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>Btw, I am assuming that speex AEC API can be used even though I am not using<o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>the speex encoder/decoder.<o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>><o:p> </o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>><o:p> </o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>><o:p> </o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>> >><o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>> >>>> So, in this scenario above, please recommend a good place to insert<o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>> >>> speex_echo_playback and speex_echo_capture. Should I be just before the<o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>> read<o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>> >>> and write to hardware? In that case, should I use a larger "frame size"<o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>> of<o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>> >>> 160 samples x 4?<o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>> >><o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>> >> Of course you can set frame size to 160*4. Otherwise you can feed<o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>> samples 4 times<o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>> >> to the AEC if you don't want to modify the frame size.<o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>> >><o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>> >>><o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>> >> Thanks in advance,<o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'>>> >> Daniel.<o:p></o:p></span></font></pre><pre><font
size=3 face=SimSun><span style='font-size:12.0pt'><o:p> </o:p></span></font></pre>
<p class=MsoNormal><font size=3 face=SimSun><span style='font-size:12.0pt'><br>
<br>
<br>
<o:p></o:p></span></font></p>
<span title=neteasefooter><span id="netease_mail_footer">
<div class=MsoNormal align=center style='text-align:center'><font size=3
face=SimSun><span style='font-size:12.0pt'>
<hr size=2 width="100%" align=center>
</span></font></div>
<p class=MsoNormal><font size=3 face=SimSun><span style='font-size:12.0pt'><a
href="http://mail.163.com/html/110414_attachment/att1.htm" target="_blank"><span
lang=ZH-CN>体验网易邮箱</span>2G<span lang=ZH-CN>超大附件,轻松发优质大电影、大照片,提速</span>3<span
lang=ZH-CN>倍</span>!</a> </span><o:p></o:p></span></font></p>
</span></div>
</body>
</html>