[Vorbis-dev] Vorbis for digital radio at low bitrates

Tue Mar 23 07:37:40 PDT 2010

On Tue, Mar 23, 2010 at 9:53 AM, Feilen, Michael <michael.feilen at tum.de> wrote:
> Hello Gregory,
>
> Thank you! CELT seems to be an interesting alternative! Especially the "packet loss concealment" and "bit error robustness" features make this codec suitable for digital radio applications.
>
> The simple reason why the OFDM framing can't be changed is because then it's not DRM anymore and the existing encoders and decoders won't work ;)

Ah, so there are encoders/decoders that you could change the codec on,
but not anything else?

> A non-integer audio framing requires an audio frame index signaling and - you're right - although I don't have to wait for all 75 audio frames to be received it doesn't seem to be a good solution.
> I couldn't find a nice divisor for an integer number of audio frames using mixed frames sizes. Furthermore, mixed frame sizes would also require signaling if they don't repeat in a periodic pattern (which is certainly a bad way to go).

Though you're only looking at an overhead of one byte per OFDM 200ms
(a pointer to the first packed vorbis frame) plus one byte per vorbis
frame (the encoded length).  Vorbis frames already code their temporal
duration internally.  Not good for error robustness, though.

> So finally, I think there are two solutions:
>
> 1) Using CELT
> 2) Using Vorbis with CBR and a sample rate of 46080 kHz, a transform length of 1024 samples per frame and putting a 48 kHz to 46.080 kHz audio resampler before the encoder and a 46.080 kHz to 48 kHz audio resampler behind the decoder. (Is it possible to run Vorbis at a sample rate of 46.080 kHz?).

Sure, it's not tuned for that sample rate— but its close enough.

> What solution do you think is best in terms of:
> A) Quality

Vorbis gives better quality for a given bitrate.  But when I say
vorbis here, I mean full vorbis, and not vorbis which has been
handicapped by constraining it to one smaller frame size (normally
vorbis uses 2048 sample frames) and a constant bitrate.

Without the ability to do block switching I wouldn't be too surprised
if CELT sounded better at medium bitrates (64-96kbit/sec). At lower
rates I'd still expect Vorbis to sound better.   This is something
that would really need to be tested.   Quality wise I think that you'd
be best off preserving the normal vorbis behaviour, and taking the few
bytes overhead in order to add the required signalling.

CELT came out of the realization that some of the codec-parts that had
been researched for Ghost (a vorbis successor in very early
development) could be used to make a very low latency codec for high
quality moderate bitrate audio, something which hardly exists even in
the commercial world (there is basically ULD and APT-X in this space,
maybe the blue-tooth codec. CELT blows these things away from a
quality/bitrate perspective). The very low latency design has certain
costs, chief among them is reduced efficiency.

> B) Compatibility (playing received audio with a standard decoder library)

The CELT bit-stream isn't frozen. (We've delayed the completion of
CELT in order to make it part of the IETF codec working group).

But otherwise, I think what you're doing would be compatible with
normal decoder libraries.

Vorbis has some non-trivial (a couple of kbytes) of configuration
headers that you'd need to bake in on the receivers, since sending
them over the air would not be reasonable.

> C) Computational complexity

A CELT decoder and a Vorbis decoder are in the same general order of
magnitude.  But Vorbis has considerable requirements for fast ram,
while CELT has much smaller memory requirements.  On some platforms
this can give CELT a considerable performance advantage.

The CELT encoder is much faster than the Vorbis encoder— it has to do
much less work. There is also a fixed-point CELT encoder.

> D) Resource requirements for a hardware implementation

I would expect a hardware implementation of CELT to be smaller, due to
the aforementioned memory requirements and many of the remaining
tables used in the reference CELT implementation could instead be
derived on the fly.