[speex-dev] Using speex.

Mon May 12 11:09:06 PDT 2003

J.K. Lin (jk at pageshare.com) wrote:
> 
> Hi:
> 
>    I am new to speex and I am evaluating the possibility of using
> Speex for web conferencing (pretty big scale).  It looks very promising.

It's worked really well for me.  I think you'll find it to be a 
great codec.

>    I have some questions, maybe very naive, but please help me:
> 
> 
>    1) Is there any sample implementation using Speex in web conerencing
>       in voice?  To be more specific, in Windows platforms?  (ActiveX? 
>       Java applet implementation?)

I created a program like this for Windows, but the source is not 
[yet?] available to the public - it's still very much a work in 
progress.  I'm willing to share information and code snippets 
though, given specific questions.

>    2) What is the recomended bandwidth?  4kbs (in one way)?  2 kbs?
>       (The shown samples sound pretty good at 4kbs.)

I recommend the wideband mode (16kHz) for really nice quality speech.  
Telling the codec to use 10-15kbps seems to work well for CBR.  VBR 
with the quality set around 6.0 is also nice, consuming roughly 
4-23kbps, although the average would be pretty low most of the time...

>    3) What is the recomended buffer?  1 second or 2 seconds?

That's way too large for an interactive conversation.  I've been 
experimenting with different buffer sizes and my current favorite is 
40ms.  Sending 40ms of audio over the wire results in a delay of 
roughly 40ms+transmission delay+playback latency+codec latency.  
Some typical numbers that I'm experiencing so far would be around:

 40ms packetization delay (packet rate of 25 packets per second)
 30ms transmission delay (typical broadband-to-broadband 1-way time)
 60ms playback latency (not too sure about this one, might be lower)
 34ms codec latency (does this overlap with packetization delay...?)
-----
164ms total latency

I'm not an expert at this stuff so take these numbers with a grain of 
salt, and if anyone has comments on them please let me know.

>    4) What would happen if sound packets are dropped (time shift
>       in different computer clock speeds)?  What if some
>       packet holes have to be filled?  (repeating the previous packet?)

I'm not sure about this "time shift in different computer clock 
speeds" thing you're talking about.  Your program should using a 
timing mechanism such that it operates independently of computer 
clock speed.

But, in the event of packet loss or delay, you can use the packet 
loss concealment feature of Speex as Jean-Marc suggested.  However, 
if you have Speex make up for a packet you don't have, you should 
probably be careful to avoid subsequently decoding that packet if 
it arrives late (as it probably will)...

>    5) Any otther issues that I should pay attention to?

Use UDP.  Don't use TCP.  You get less packet overhead (which can 
be really important at high packet rates) and you get better 
performance.  Actually, there's also RTP, but I don't know much 
about that yet.  I'm pretty sure it's layered on top of UDP and 
you'd have to get an RTP library from somewhere to use it (or 
maybe it's simple enough to be implemented without too much 
work...?)

Packet overhead.  As you increase your packet rate, you decrease 
one of the latency factors (packetization delay).  However, you 
also increase bandwidth wasted by packet overhead.  The IP headers 
contain 20 bytes, and then UDP uses an additional 8 bytes.  If you 
have a user on a dialup modem, the PPP headers will use an 
additional 5-7 bytes.  That's a total of 28-35 bytes PER PACKET.  
At a rate of 25 packets per second, that's 700-875 bytes per second, 
or 5-7kbps, which gets significant for a dialup modem user.

Communications protocol.  This handles call setup, teardown, 
audio transmission, format negotiation, and whatever else you may 
like.  The question is, which protocol should you use?  It seems 
that the two popular ones are H323 and SIP.  They are large and 
complicated standards but if you want your program to be 
interoperable with other programs, you should use one (or both?)
Personally, I just made my own [simple] proprietary protocol.  
Maybe some day I'll go for interoperability but I'm not there yet.

Preprocessing.  Sometimes there's a strong bass signal present 
in a recording from a mic.  It can be caused by vibrations or air 
flow or just ambient noise.  It's really helpful to remove this 
bass before encoding the audio.  It makes the codec's job easier 
(I think?) and, more importantly, is much easier on the ears of 
someone listening with headphones...  To remove bass from a signal, 
you can run it through a high-pass filter using convolution.  It's 
not as hard as it might sound.  There's an excellent book on DSP 
techniques available online for free at:

http://www.analog.com/Analog_Root/static/technology/dsp/training/materials/dsp_book_index.html

Echo cancellation.  If someone in the conversation is using 
speakers, the other person (or people) will hear an echo of their 
own voice(s) as the sound travels from the speakers back into a 
microphone.  This is really annoying.  I usually try to get people 
to wear headphones.  It's possible to do echo cancellation in 
software, but it's really hard.  This would really be a killer 
feature for the Speex codec to provide, if that's possible... :)

>    6) Anybody did it and I can learn from?

Sure.  I'm not sure if this sort of thing is on-topic for this 
list though...?

Tom
--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'speex-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.