[Speex-dev] Discontinuous encoding and VBR tradeoffs

Fri Oct 10 13:49:18 PDT 2008

I'm writing a voice communication application, and I've got a few
issues that I'd like to get ironed out, but I don't know enough about
the speex implementation.

First of all, this application is mainly used for conferencing - many
people are in a room and only 1-2 are ever talking at a time.  So,
always encoding and transmitting everyone's audio stream would be
rather wasteful.  I also do not want to be creating and destroying
encoder and decoder objects every time someone start/stops talking.

Ideally, what I'd like to do is record/encode when someone pushes the
talk key, then stop when they release it, AND reuse the same
encoder/decoder objects every time this happens.  However, if the
person lets go of the talk key while there is still audible sound,
that sound will carry over to the next time they start transmitting.
I know why this happens, but I don't know the proper way to prevent
this from happening.  Right now, I encode a few frames of silence and
send them over the network in order to "reset" the encoder state, but
this is not ideal bandwidth-wise.

Is there a better way to do this?  I know that DTX handles a similar
problem, but I'm not sure if it would do any good.

Secondly, I'm curious to know if VBR really has any drawbacks in terms
of quality.  Specifically, can I be guaranteed to achieve the same
quality with AT MOST the same bandwidth as with CBR, or can a VBR
encoded frame actually be significantly bigger than a CBR encoded
frame of the same quality? I also noticed that VBR has its own quality
setting - does this override the main quality setting when VBR is
enabled?  Is there any noticeable difference in quality (audibly or
mathematically) between a CBR and VBR stream with the same quality
setting?

Basically, I'd like to know if there is any reason to NOT have VBR
enabled for my application.  It's meant to run over broadband
internet, but I know that whoever is hosting would like to conserve
their bandwidth.  However, I also want to maximize voice quality for
the users.

-Chris Weiland