[Vorbis-dev] Object Audio - Misusing Vorbis?

Fri Oct 31 06:50:30 PDT 2014

Hi there

I'm trying to use Vorbis in a slightly unusual way, and I'm a little unclear
if I'm going to be able to do what I want to do nicely, or at all. Many
apologies if I'm on the wrong list, or if this is a FAQ, or off-topic. I'd
be very happy to be re-routed!

I'm doing a little experimental work on a block-based bitstream format for
object audio streaming. This format allows any number of audio objects to
start or stop on block boundaries, and each audio object might be mono or
multichannel. Blocks may vary in length within the stream, from 1 to 65535
samples, and each block is encoded as a single "chunk" of data containing
all audio for all audio objects for that block of time. There's a bunch of
other data in the chunk, for instance spatalisation information. The
bitstream format is already (experimentally) operational using a very simple
codec that is lossless for all this data, including the audio.

What I was hoping to do next is a lossy version, and thought Vorbis might be
suitable for the underlying audio coding, but I'm struggling a little to
understand exactly what I could/should be using. I thought initially I could
use just the vorbis/vorbisenc calls to encode the audio for each audio
object each as an embedded bitstream, and then write the bytes returned in
ogg_packet structures from vorbis_analysis_headerout() and vorbis_analysis
() within my packets, but I'm not sure if this is supported behaviour - and
for decoding, I'm not sure how I should populate the reconstructed
ogg_packet structures correctly. For instance, I can guess what should be in
b_o_s, but I've clearly stepped outside the documented API by now, so I'm
not sure what might or might not be supported in the future...

I could use the whole Ogg binding, which would clearly take me back into
standard API space. Then, for each block of time, my bitstream chunk would
contain, for each audio object, a number of Ogg packets, which would in turn
contain Vorbis data. There's clearly quite a bit of redundancy here.
(There's clearly redundancy in the Vorbis headers themselves too in this
context, but one thing at a time ;-) That all said, I'd still expect the
bitrate to be much lower than currently! Note that here, Ogg is *within* my
chunk. If my chunks were cut into packets and embedded in an Ogg stream, Ogg
would be in use at two different levels of the embedding hierarchy.

One of the things I'm unclear about is whether or not I'm guaranteed to get
the right data into my chunk if I do this. If I submit, say, 4800 samples of
audio, and call all the relevant vorbis/ogg calls to fully flush the
resulting bytes, am I guaranteed to get exactly the ogg/vorbis packets that
are required to resynthesize those 4800 samples of audio? During
resynthesis, no ogg/vorbis packets from the previous or next block/chunk
will be available (except any internal state the vorbis decoders might be
keeping).

Another thing I'm not too sure about is how I should handle lost chunks.
Because the ogg/vorbis streams for new audio objects can start up at any
point in time, it's possible that the new ogg/vorbis header packets are
within the chunk that is lost. Is there then a way I can recover this audio
object? I periodically send "sync" packets which are designed to allow
reconstruction of all state, so some more header data could go in there, but
I'm not too sure what should be used. Maybe all the ogg packets relating to
vorbis_analysis_headerout? Can I reset all the decoders with this header
data and then start feeding the (much later) analysis packets? Is there a
lighter way to do this (I don't really want to be resending all the Ogg
headers every second)?

What I'd love is a simple API which exposes a single initialisation packet
(sample rate, quality, codebook?) which can be shared by a number of
encoders/decoders. I could then put this in the bitstream at the top and
repeat it once periodically for recovery purposes. I'd then use an
encoder/decoder for each audio object, where each has a potentially
different (but constant and known) channel count, and have read/write
routines which would receive the block length (which is common to all audio
objects for a particularly block, and known) and convert between some
multichannel audio and bytes. Is there some natural way to use (or twist)
the code this way? (There are refinements on this idea, but it would be a
fantastic baseline.)

Sorry if I've been unclear - this is quite a complex scenario. And it's
entirely possible I'm just approaching this from completely the wrong
direction - any clues and suggestions appreciated!

Many thanks,

--Richard

PS, some audio objects will be carrying HOA data, so will have potentially
LOTS of channels (64?). How many channels can Vorbis handle in coupled mode
these days? Or do I need to handle high channel counts by sending each
channel individually? If so, I'll have a bunch more (redundant) headers to
manage... :-/