[Speex-dev] draft-ietf-avt-rtp-speex-01.txt

Aymeric Moizard jack at atosc.org
Tue May 15 15:51:45 PDT 2007

Here my comments:

Page 3:

    To be compliant with this specification, implementations MUST support
    8 kHz sampling rate (narrowband)" and SHOULD support 8 kbps bitrate.
    The sampling rate MUST be 8, 16 or 32 kHz.

There is a type above after (narrowband), there is a " extra character.

I don't understand what is the motivation to specify "SHOULD support 8 
kbps bitrate".

Page 8:

    Optional parameters:

       ptime: see RFC 4566.  SHOULD be a multiple of 20 msec.

       maxptime: see RFC 4566.  SHOULD be a multiple of 20 msec.

In real world, many SIP application use either 20 or 30ms. This
ptime parameter is really not reliable for negotiation... On possible
way to handle non multiple would be to take the right above value:
if 30ms is specify, then recommand to use 40ms for speex.

Page 10:

    The value of the sampling frequency is typically 8000 for narrow band
    operation, 16000 for wide band operation, and 32000 for ultra-wide
    band operation.

The word "typically" means to me that it could be something else than
8000, 16000 or 32000: I would recommend to make it clear:

    The value of the sampling frequency MUST be either 8000 for narrow band
    operation, 16000 for wide band operation, and 32000 for ultra-wide
    band operation.

       ptime: duration of each packet in milliseconds.

http://www.ietf.org/rfc/rfc4566.txt specify that in the ptime definition:
"it is intended as a recommendation for the encoding/packetisation of 
audio". Thus, I would recommend to specify the same text as in rfc3264
for sdp offer/answer model:

    "If the ptime attribute is present for a stream, it indicates the
    desired packetization interval that the offerer would like to
    receive.  The ptime attribute MUST be greater than zero."

It might also be a good idea to say that even if an offerer would like
to receive 20ms, the sender MAY use a different packetization interval...
This is the origin of numerous interop issue with speex in SIP 

       sr: actual sample rate in Hz.

       ebw: encoding bandwidth - either 'narrow' or 'wide' or 'ultra'
       (corresponds to nominal 8000, 16000, and 32000 Hz sampling rates).

Both the "sr" and "ebw" conflicts with speex/XXXX rtpmap. I really 
recommend to remote both those definition so that application will
configure themselves using either speex/8000, speex/16000, speex/32000.
Having 3 way to specify sampling rate is a nightmare for interop.

Page 11:

       mode: Speex encoding mode.  Can be {1,2,3,4,5,6,any} defaults to 3
       in narrowband, 6 in wide and ultra-wide.

I always asked for a "table" in the specification here providing link 
between "mode" and "bitrate". Else, you get those mails:


If I get it right, the table is there:
    Table 4: Quality versus bit-rate

Also, this table exists for narrowband, but still it does not for wideband 
or ultrawideband: it would be nice to get also those ones. I was really 
lost implementing this in my SIP application.


       m=audio 8008 RTP/AVP 97
       a=rtpmap:97 speex/8000
       a=fmtp:97 mode=4

    This examples illustrate an offerer that wishes to receive a Speex
    stream at 8000Hz, but only using speex mode 4.

Is it a recommandation or a MUST: for me, and to allow better 
interoperability, an application is sending "mode=4" because it
wishes to receive "mode=4": but, in case, the remote application
can only send "mode=3", the receiver MUST be prepared to receive
ANY mode. We can't get interoperability without this and I would
recommand to specify that such use-case will often happen in real
world and that it MUST be supported.

    Several Speex specific parameters can be given in a single a=fmtp
    line provided that they are separated by a semi-colon:

       a=fmtp:97 mode=any;mode=1

No error here: just curious why you want to allow this? Wouldn't it
be nice to specify that the order of mode parameter is significant?
I guess this is what you want? (in that case, "mode=1,mode=any" might
be more meaningfull?)

More generally, I would really like to have a line specifying that 
whatever you proposed (ptime, mode, vbr, cng), the sender could
use different encoder configuration for any reason (bandwidth reason
or lazy developper): a speex decoder don't have to be configured before
decoding so an application MUST be able to decode any speex stream
it receive provided that the sample rate was correctly negotiated.

today, many speex application I've seen are broken on the receiver side,
because they configure decoders using SDP negotiation "wish" or "static
configuration": providing information about this can be valuable.

amsip - http://www.antisip.com
osip2 - http://www.osip.org
eXosip2 - http://savannah.nongnu.org/projects/exosip/

On Tue, 15 May 2007, Alfred E. Heggestad wrote:

> Hi all
> We are about to send an updated version of the internet draft
> "RTP Payload Format for the Speex Codec" to the IETF AVT working group.
> Before submitting we would like your input, if you have any comments
> or input please send them to the mailing list.
> If we don't get any comments in 1 week (by 22. May 2007) we will go ahead
> and submit it to the IETF. Of course you can comment on it also after it
> has been submitted, but we would like to get the input from the Speex
> community first..
> The Internet Draft is attached.
> /alfred

More information about the Speex-dev mailing list