Background: I'm working on a web conferencing application hosted in Silverlight, and we desperately need an echo canceller. A while back, someone made a Java port of Speex (JSpeex), which was later ported to C# (CSpeex). However, those are based on an older version of the Speex code, and while they seem to interoperate just fine with the latest official release, they don't have certain key features, like acoustic echo cancellation. Consequently, I'm working on porting the current version of mdf.c over to C#, with a specific goal of being able to use it in our Silverlight app.<div>
<br></div><div>I've taken a fairly straightforward approach so far: I cut-and-pasted the code from mdf.c and the necessary supporting files, defined all the C functions as static methods within a C# class, replaced all the float* pointers with float[] arrays and offsets, and then tweaked some of the C syntax to meet the C# requirements. It doesn't look much like C#, but I'm trying to keep it as close to the original as possible, at least until I can get it working, as refactoring can wait.</div>
<div><br></div><div>The net result is that it compiles, and seems to run correctly, i.e., in stepping through the code, all the relevant stuff seems to be executing and it's not throwing any exceptions. The "only" problem is that I'm not getting any echo cancellation. This could be the result of (1) some error I made in my port, or (2) it could be that I'm using it wrong, or (3) it could be the result of some weirdnesses introduced by Silverlight (such as too much latency: see below).</div>
<div><br></div><div>I know that I'm gonna have to be largely on my own troubleshooting this, but I'd like to see if I can get feedback on two questions:</div>
<div><br></div><div>(1) Through the current release (v4.0), Silverlight introduces a surprising amount of latency when playing and recording audio. The time from when I submit a sound to be played until I record that sound coming back ranges from 230ms to 270ms, depending on the machine. This is roughly a 12-frame delay, and is obviously much larger than the 2-frame latency that Jean-Marc assumed when he was coding up the Speex AEC. One possible way of compensating for this would be to increase the tail (to make room for ~350ms worth of sound), but I have to imagine that's not likely to result in much cancellation. The alternative that I've been trying is to keep the tail at roughly 200 ms (within the range of recommended values), but then to buffer 10 or so frames in a queue, pulling out the oldest one when I make my call to speex_echo_cancellation(). To my untutored mind, it seems like this should work -- but some of the comments I've read in the documentation (like the one warning against using two different sound cards, because their clocks wouldn't be adequately synchronized) make me wonder if this is a fool's errand. Any thoughts on whether this approach is likely to work, or any suggestions on how to do it better?</div>
<div><br></div><div>(2) Within the echo canceller itself, what values should I be checking to see if echo cancellation is actually happening? Obviously I can listen to the sound, but if I'm trying to figure out what's going wrong, I likely need to poke around in the code while it's actually trying to cancel the echo. I see some "sanity checks" in there, as well as some other checks for reduction, but I'm honestly not entirely sure what else I should be looking for. I've read through all the Speex docs, any relevant posts I could find, as well as the various technical articles Valin has wrote or referenced (I'm a bit out of my depth in these), but so far I haven't managed to wrap my head around the Speex AEC internals. So any pointers or shortcuts, or troubleshooting hints, would be appreciated.</div>
<div><br></div><div>Thanks in advance.</div><div><div><br>Ken Smith<br>Cell: 425-443-2359<br>Email: <a href="mailto:ken@alanta.com" target="_blank">ken@alanta.com</a><br>
Blog: <a href="http://blog.wouldbetheologian.com/" target="_blank">http://blog.wouldbetheologian.com/</a><br>
</div></div>