[Speex-dev] Discontinuous encoding and VBR tradeoffs

Fri Oct 10 15:08:13 PDT 2008

I don't know why you're getting sound carrying over to the next time you 
encode - that doesn't sound normal to me.  Have you tried saving and 
examining the raw audio you're feeding to the encoder?  Have you tried 
encoding and decoding that using speexenc/speexdec?

I use Speex in VBR mode for a VoIP app.  I'm always recording and running 
the audio through the preprocessor (denoise, AGC) while a session is 
established.  I only encode and transmit while speech is detected (if the 
user has chosen VAD mode) or while a button is held down (if the user has 
chosen PTT mode).  (These VAD/PTT are application-level constructs - I'm 
not using Speex VAD or DTX.)  No sound is carried over from the end of one 
burst to the beginning of the next.  I use the same encoder/decoder objects 
for the life of the session and don't do anything to reset state between 
bursts.

As for VBR quality, VBR mode is designed to target a specified level of 
quality without guaranteeing how much bandwidth might be used at any 
particular moment.  So, it's ideal when you want the best tradeoff between 
quality and bandwidth while not requiring a strict constraint on bandwidth.  
Try watching a graph of bandwidth utilization as you're talking and 
transmitting using VBR at a particular VBR quality setting.  Try varying 
the VBR quality while listening for the difference in audio quality and 
watching the different in the bandwidth graph.  It behaves as one would 
expect - I doubt you'll be surprised by the results.

Tom

"Chris Weiland" <hobbiticus at gmail.com> wrote:
> 
> I'm writing a voice communication application, and I've got a few
> issues that I'd like to get ironed out, but I don't know enough about
> the speex implementation.
> 
> First of all, this application is mainly used for conferencing - many
> people are in a room and only 1-2 are ever talking at a time.  So,
> always encoding and transmitting everyone's audio stream would be
> rather wasteful.  I also do not want to be creating and destroying
> encoder and decoder objects every time someone start/stops talking.
> 
> Ideally, what I'd like to do is record/encode when someone pushes the
> talk key, then stop when they release it, AND reuse the same
> encoder/decoder objects every time this happens.  However, if the
> person lets go of the talk key while there is still audible sound,
> that sound will carry over to the next time they start transmitting.
> I know why this happens, but I don't know the proper way to prevent
> this from happening.  Right now, I encode a few frames of silence and
> send them over the network in order to "reset" the encoder state, but
> this is not ideal bandwidth-wise.
> 
> Is there a better way to do this?  I know that DTX handles a similar
> problem, but I'm not sure if it would do any good.
> 
> Secondly, I'm curious to know if VBR really has any drawbacks in terms
> of quality.  Specifically, can I be guaranteed to achieve the same
> quality with AT MOST the same bandwidth as with CBR, or can a VBR
> encoded frame actually be significantly bigger than a CBR encoded
> frame of the same quality? I also noticed that VBR has its own quality
> setting - does this override the main quality setting when VBR is
> enabled?  Is there any noticeable difference in quality (audibly or
> mathematically) between a CBR and VBR stream with the same quality
> setting?
> 
> Basically, I'd like to know if there is any reason to NOT have VBR
> enabled for my application.  It's meant to run over broadband
> internet, but I know that whoever is hosting would like to conserve
> their bandwidth.  However, I also want to maximize voice quality for
> the users.
> 
> 
> -Chris Weiland
> _______________________________________________
> Speex-dev mailing list
> Speex-dev at xiph.org
> http://lists.xiph.org/mailman/listinfo/speex-dev