[Vorbis-dev] Decoding for ambisonic Ogg audiob

Mon Feb 26 20:24:59 PST 2007

The prospect of people actually putting B-format audio (via the panner
or directly input) into Ogg/Vorbis brings an interesting challenge:
What do we do with the audio after decoding it?

The following sane options exist:

A) Simply output the B-format audio
B) Produce a downmix
  1) Mono.
  2) Stereo  blumlein crossed pairs
  3) Stereo UHJ
  4) binaural
C) Produce speaker feeds
  1) Fully generalizable speaker feed decoder
      (such as http://www.kokkinizita.net/linuxaudio/adec-pict.html)
  2) G-format (fixed decode for the 5.1 layout)

(A) is pretty much a no-brainer, and minus some polish on marking up
the channel mapping we pretty much already do it today.

I think that some form of downmix support will be an essential feature
in the libraries. Most users will not have software or systems which
are equipt to play b-format Vorbis files, at least initally. Anyone
distributing such files will have a hard time if the files refuse to
play at all for people.

Mono, and simulated blumlein are the simplest downmixes and could be
added with a very minimal amount of code. They are also the least
satisfactory.  I think mono output would be especially surprising to
the user and it probably shouldn't be considered as an automated
fallback unless we have no other choice.

UHJ, Binaural, and actual speaker feeds would be preferable, but all
require some degree of filtering (for binaural, a full FIR engine and
a stack of HRTFs are needed). So this raises a question if this
functionality belongs in the core library.  I think both a decent
G-format decoder and a decent UHJ decoder can be implemented with a
fairly simple set of IIRs and some linear combinations.

A full speaker decoder as well as a binaural decoder will require a
user-interface and can't really be done automagically. So I think they
should be dropped from consideration as compatibility features for the
core libraries.

Ideally an application using the library should be able to register
its ability to receive B-format, and if it hasn't it should receive a
downmix and be otherwise unaware that the file is a surround file.
Since non-surround capable playback software will almost certainly be
2ch only, this diminishes the usefulness of a built-in G-format
downmix.

So we're left with Blumlein or UHJ. As I mentioned above, I think UHJ
is probably a preferable default but it will take a little more code
to implement. It would probably be worthwhile to do some A/B tests
between the two to find out what listeners prefer on with the
available recordings.

For multichannel able apps we will need something to do speaker
decodes. I think there is an opportunity here for an additional
library for this application. Perhaps  Fons Adriaensen's decoder (I
linked to it above) might be available for conversion into a light
version library with a collection of speaker arrangement presets?

Anything involving fancy layouts and more than 8 speakers is probably
fine going with a jackified decoder.. especially since such systems
will probably want to include things like room correction filters
(http://drc-fir.sourceforge.net/).

I have one other question on my mind: Should this be being solved just
for Vorbis, or is there a clear place to put a more general solution
which will cover other xiph codecs (Flac, OggPCM)?