[Vorbis-dev] Vorbis for digital radio at low bitrates

Tue Mar 23 06:18:14 PDT 2010

On Tue, Mar 23, 2010 at 5:59 AM, Feilen, Michael <michael.feilen at tum.de> wrote:
> A transform length of 960 samples per frame is important to store an integer number of audio frames per audio super frame (ASF). One ASF corresponds to 400 ms of audio in one mode and 200 ms of audio in the other mode. The 400/200 ms are fixed and cannot be changed since then the whole OFDM framing of DRM had to be changed.
> Let's do an example for an ASF duration of 200 ms:
> At a sample rate of 48 kHz one ASF carries 48 kHz * 200 ms = 9600 samples of audio per channel. Now, 960 samples per frame gives 9600/960 = 10 audio frames per ASF which is a nice integer number.
> I think this requirement is somewhat crucial!
>
> Other (bad) solutions:
> 1) Given the assumption that there is some space for additional ASF index signaling and using a transform length of 1024 samples per frame: then, one had to receive 8 ASFs containing an integer number of 75 audio frames with a transform length of 1024 samples until the audio decoding could be started: (9600 / 1024) * 8 = 75 audio frames corresponding to 200 ms * 8 = 1.6 seconds. This is hardly acceptable.
>
> 2) One could use an audio sample rate of 46080 kHz. 46080 kS/s * 0.2 sec per ASF / 1024 samples per audio frame = 9 audio frames per ASF
>
> Any other ideas from you guys?

Another option would be to use CELT instead.  It can be run at 960
samples/frame, 480 samples/frame, 240 samples/frame, etc. (and then
pack two or four respectively).  Although CELT's performance isn't as
high as Vorbis, it has been specifically designed with these sorts of
constraints in mind and offers very low latency (although I suppose
thats mooted by the enormous ASF sizes!)

But I'm not sure why you can't change the OFDM framing— after all, any
DRM speaking vorbis isn't going to be compatible.  Input/output the
OFDM at a different sampling rate, or change the duration.

Or, spill across multiple frames— as you suggest ... but I see no
reason why you'd have to wait 75 audio frames to decode.  The
transmitter just packs data as it comes into frames, the receivers
unpack and buffer. Some frames will spill into a second ASF, so the RX
will have to wait for two ASFs to decode rather than one. The fact
that a frame won't start on an ASF boundary for another 75 frames
isn't relevant, the decoder should never need to buffer more than two
ASFs.  Obviously this kind of frame splitting has a cost of reduced
loss robustness, since a failure to decode one ASF will kill any
spanned audio frames (possibly two more).

You're still left with the fact that the mixed frame sizes in Vorbis
aren't required to add up to a common size (e.g. it can encode
2048,2048,256,2048,2048,2048,256,256,...) of course, you could force
the encoder to always encode Long_block_size/Short_block_size shorts
in a run— but the more restrictions you impose (you'll also need to
make vorbis CBR) the closer the performance will get to CELT's.

If you happen to be using GNURadio for any of your DRM stuff,  I
tossed up a CELT block here:
http://git.xiph.org/?p=users/greg/gnuradio.git;a=summary