[Speex-dev] CELT 0.5.0 is out

Thu Oct 16 06:39:14 PDT 2008

On Thu, 16 Oct 2008, Jean-Marc Valin wrote:

> Aymeric Moizard a écrit :
>>> None of that is defined yet, though I'm open to suggestions on how to do
>>> the mapping.
>>
>> CELT/44100 and CELT/48000
>> a=fmtp:105 stereo=on
>>
>> probably a latency value?
>
> It would definitely need a frame_size value
>
>> "CELT" doesn't seems to be used for any other existing audio related
>> stuff? right?
>
> Not yet. It's very new (bit-stream not frozen yet), so at this point,
> there are a few early adopters, but that's it.
>
>> I didn't read fully the doc...
>>
>> One question for you Jean-Marc: can you confirm that the decoder
>> will receive enough information to autoconfigure itself when receiving
>> RTP streams? One documentation sentence seems to not go in that
>> direction while it's a requirement for VoIP and mainly for SIP
>> negotiation.
>
> No. Unlike Speex, CELT will not be able to decode anything with no prior
> information on the stream. To decode a stream of packets, CELT needs the
> following information
> 1) Sampling rate. It's not just for setting the soundcard's rate, if you
> use the wrong rate, you get garbage.
> 2) Mono/Stereo
> 3) Frame size in samples
> 4) If there's more than one frame in a packet, you need to tell it where
> the boundary is. However, once that's done, CELT knows the bit-rate used
> just from the packet size, so there's no need to signal a fixed
> communication rate.
>
> The main reason CELT can't do like Speex (I wish it could) is that in
> Speex, the overhead of transmitting the mode info is 5 bits for
> narrowband and 9 bits for wideband. With 10 ms frames, that's just 250
> and 450 bps. With CELT, there would be a bit more data needed and the
> frame size can be as small as 2 ms, so we could end up with several kbps
> of mode signalling. In the current code, there's no signalling at all.
> The good thing is that after a few frames, the decoder should at least
> realise it's decoding garbage.

I understand, but CELT would be useless for SIP if one can't read/guess
correctly decoder configuration from the RTP data.

One possible way to cope with this would be to have several CELT payload
defines for use in SIP signalling. This is usually not well accepted as
this would remove flexibility and increase size and error withing SIP
negotiation.

I don't think this requirement is only for SIP: any device receiving
data would reasonably want to be sure it decodes it correctly. No
matter the overhead.

One approach which is very acceptable to me would be to have something
like PPS/FPS for h264: you will send a special data packet (one bit?)
to mark the packet as data or as decoding data.

The first packet sent is a "decoding information data" content and
other packets are "real data". With RTP, you will retranmit this packet
regularly to cope with packet loss or delayed initiation (initial packets
are often lost at the beginning on one side of the conversation).

I think this approach will fit your need for keeping CELT as
low as possible but mandatory for VoIP.

tks,
Aymeric MOIZARD / ANTISIP
amsip - http://www.antisip.com
osip2 - http://www.osip.org
eXosip2 - http://savannah.nongnu.org/projects/exosip/