[Speex-dev] Questions relating RTP packetisation

Sun Sep 30 03:26:54 PDT 2012

Hello.

I am working on implementing RFC 5574 (RTP Payload Format for the Speex
Codec) in the ffmpeg and have a question concerning it.
It would be nice if somebody could answered it.

* Chapter 4.1.1 Registration of Media Type Audio/Speex, subpart
"Optional parameters" states these SDP optional parameters:

vbr: variable bit-rate - either 'on', 'off', or 'vad' (defaults
     to 'off').  If 'on', variable bit-rate is enabled.  If 'off',
     disabled.  If set to 'vad', then constant bit-rate is used, but
     silence will be encoded with special short frames to indicate a
     lack of voice for that period.  This parameter is a preference
     to the encoder.

cng: comfort noise generation - either 'on' or 'off' (defaults to
     'off').  If 'off', then silence frames will be silent; if 'on',
     then those frames will be filled with comfort noise.  This
     parameter is a preference to the encoder.

And Speex documentation here 
(http://www.speex.org/docs/manual/speex-manual/node4.html#SECTION00417000000000000000)
states this:

When enabled, voice activity detection detects whether the audio being
encoded is speech or silence/background noise. VAD is always implicitly
activated when encoding in VBR, so the option is only useful in non-VBR
operation. In this case, Speex detects non-speech periods and encode
them with just enough bits to reproduce the background noise. This is
called ``comfort noise generation'' (CNG).

So, I am a little lost: speex doc says that VAD for CBR environment operates
with CNG during silence periods. But RFC separates these two.

My question is: what functionality is expected for cases:
* vbr=vad + cng=off
* vbr=off + cng=on

P.S. I'm aware of DTX and I think it would make sense if "cng" parameter were actually
controlling dtx func (description in RFC looks quite like it).

Thanks.