[Vorbis-dev] Vorbis for digital radio at low bitrates

Tue Mar 23 06:53:58 PDT 2010

Hello Gregory,

Thank you! CELT seems to be an interesting alternative! Especially the "packet loss concealment" and "bit error robustness" features make this codec suitable for digital radio applications.

The simple reason why the OFDM framing can't be changed is because then it's not DRM anymore and the existing encoders and decoders won't work ;)

A non-integer audio framing requires an audio frame index signaling and - you're right - although I don't have to wait for all 75 audio frames to be received it doesn't seem to be a good solution.

I couldn't find a nice divisor for an integer number of audio frames using mixed frames sizes. Furthermore, mixed frame sizes would also require signaling if they don't repeat in a periodic pattern (which is certainly a bad way to go).

So finally, I think there are two solutions:

1) Using CELT
2) Using Vorbis with CBR and a sample rate of 46080 kHz, a transform length of 1024 samples per frame and putting a 48 kHz to 46.080 kHz audio resampler before the encoder and a 46.080 kHz to 48 kHz audio resampler behind the decoder. (Is it possible to run Vorbis at a sample rate of 46.080 kHz?).

What solution do you think is best in terms of:
A) Quality
B) Compatibility (playing received audio with a standard decoder library)
C) Computational complexity
D) Resource requirements for a hardware implementation

Thank you very much!

Michael

-----Ursprüngliche Nachricht-----
Von: Gregory Maxwell [mailto:gmaxwell at gmail.com] 
Gesendet: Dienstag, 23. März 2010 14:18
An: Feilen, Michael
Cc: Ralph Giles; vorbis-dev at xiph.org
Betreff: Re: [Vorbis-dev] Vorbis for digital radio at low bitrates

On Tue, Mar 23, 2010 at 5:59 AM, Feilen, Michael <michael.feilen at tum.de> wrote:
> A transform length of 960 samples per frame is important to store an integer number of audio frames per audio super frame (ASF). One ASF corresponds to 400 ms of audio in one mode and 200 ms of audio in the other mode. The 400/200 ms are fixed and cannot be changed since then the whole OFDM framing of DRM had to be changed.
> Let's do an example for an ASF duration of 200 ms:
> At a sample rate of 48 kHz one ASF carries 48 kHz * 200 ms = 9600 samples of audio per channel. Now, 960 samples per frame gives 9600/960 = 10 audio frames per ASF which is a nice integer number.
> I think this requirement is somewhat crucial!
>
> Other (bad) solutions:
> 1) Given the assumption that there is some space for additional ASF index signaling and using a transform length of 1024 samples per frame: then, one had to receive 8 ASFs containing an integer number of 75 audio frames with a transform length of 1024 samples until the audio decoding could be started: (9600 / 1024) * 8 = 75 audio frames corresponding to 200 ms * 8 = 1.6 seconds. This is hardly acceptable.
>
> 2) One could use an audio sample rate of 46080 kHz. 46080 kS/s * 0.2 sec per ASF / 1024 samples per audio frame = 9 audio frames per ASF
>
> Any other ideas from you guys?

Another option would be to use CELT instead.  It can be run at 960
samples/frame, 480 samples/frame, 240 samples/frame, etc. (and then
pack two or four respectively).  Although CELT's performance isn't as
high as Vorbis, it has been specifically designed with these sorts of
constraints in mind and offers very low latency (although I suppose
thats mooted by the enormous ASF sizes!)

But I'm not sure why you can't change the OFDM framing— after all, any
DRM speaking vorbis isn't going to be compatible.  Input/output the
OFDM at a different sampling rate, or change the duration.

Or, spill across multiple frames— as you suggest ... but I see no
reason why you'd have to wait 75 audio frames to decode.  The
transmitter just packs data as it comes into frames, the receivers
unpack and buffer. Some frames will spill into a second ASF, so the RX
will have to wait for two ASFs to decode rather than one. The fact
that a frame won't start on an ASF boundary for another 75 frames
isn't relevant, the decoder should never need to buffer more than two
ASFs.  Obviously this kind of frame splitting has a cost of reduced
loss robustness, since a failure to decode one ASF will kill any
spanned audio frames (possibly two more).

You're still left with the fact that the mixed frame sizes in Vorbis
aren't required to add up to a common size (e.g. it can encode
2048,2048,256,2048,2048,2048,256,256,...) of course, you could force
the encoder to always encode Long_block_size/Short_block_size shorts
in a run— but the more restrictions you impose (you'll also need to
make vorbis CBR) the closer the performance will get to CELT's.

If you happen to be using GNURadio for any of your DRM stuff,  I
tossed up a CELT block here:
http://git.xiph.org/?p=users/greg/gnuradio.git;a=summary