[ogg-dev] Ogg audio surround-sound
arc at Xiph.org
Thu Nov 10 19:42:59 PST 2005
This came out of the OggPCM discussion, but I think it needs to be addressed on
a wider scale.
Let's start here, 5 years ago..
(I included this email, below)
I emailed David (author of that email) and asked him to join this list.
I'm thinking, as I look at the problem, that surround sound needs to be defined
_outside_ the audio codec stream. Comments are inappropriate, since we need to
know if our apps can support a stream based on packet0.
As Silvia mentioned recently, we've been lacking a multi-audio codec handling
system for a long time. Wether that be for surround sound, prehaps with
different samplerate/samplesize, or for alternative languages, or for recorded
tracks which need to be mixed.
Let's begin with the above email and continue from there. If this is to be
outside of OggPCM, then it should be seperate from the thread for OggPCM beside
not encoding the data within it's header.
Over the last two weeks or so, I've been thinking about how to add surround
sound to Ogg -- and more than that, to do it in the best way possible. With
this in mind, I started considering using Ambisonic surround sound. The
advantages of this format are considerable:
a) It was developed in the early to mid '70s, so the patents should
be expired by now.
b) It's scalable -- it can handle everything from mono, to stereo,
full horizontal surround, or full 360-degree spherical surround
c) It's based on a sound mathematical foundation (unlike Dolby's
5.1 system), allowing you to calculate how it should be able
d) Depending on how many channels you are willing to use (and the
source of your material) you can make the positioning of sound
in the sound field as accurate as you want. (First-order
Ambisonics requires four channels for full periphonic surround,
while second-order Ambisonics provide a larger 'sweet spot' with
eight channels.) Since an Ogg stream can have up to 255 channels,
the format itself will not be a limitation.
e) It's compatible with existing decoder technology -- if your player
does not support surround decoding, two of the channels can be
decoded as M/S stereo. (Once patent issues regarding M/S stereo
in digital audio encoding are worked out, that is... Monty thinks
that M/S stereo may have been patented by Fraunhofer for audio
Many articles further describing the format itself are available at
www.ambisonic.net, but I'll try to give a brief overview here for those of
you who are unfamiliar with it.
Ambisonics is based on the principle of encoding spherical harmonic components
of a sound field into one or more audio channels, which allow reproduction of
that sound field later on with an array of speakers in a certain configuration.
The spherical harmonics it reproduces are similar in shape to the various
electron orbitals in an atom, if any of you are familiar with that. The first
component, the 'W' signal, reproduces the pressure of the signal. (This would
be 0th-order Ambisonics, and gives you mono. This takes the shape of the 'S'
orbitals in an atom.) The next three signals are the directional components
of the signal, which give you the 2- or 3-D spatial information. These are
the X, Y, and Z signals. (These make up 1st-order Ambisonics, and have the
shape of the 'P' orbitals in an atom. If you decode only the W and Y signals
through an M/S decoder, you have stereo. If you decode WXY, you have
horizontal surround. If you decode all four, you have periphonic surround.
Anyway, you get the picture.) The next four signals describe the curvature
of the signal (I think -- I don't understand this realm terribly well yet),
and have the shape of the 'D' orbitals of an atom. These make up the signals
of 2nd-order Ambisonics, which will, if used, increase the accuracy of the
reproduced sound field considerably, and widen the 'sweet spot' by a fair
amount. Higher-order Ambisonics are possible, but little research has been
done to this date. All existing Ambisonic recordings of actual events are
in first-order Ambisonic formats.
Now that I've tried to describe what the Ambisonic system is (hopefully I
didn't confuse you too much -- read some of the articles on the site I gave
earlier for more in-depth information), I'll describe more specifically what
I have in mind for the Ogg project.
First of all, though there are several formats currently in use for
distributing Ambisonic material (including one -- UHJ -- which is stereo
compatible) I propose that we use Ambisonic B-format (using the harmonics
directly -- we would have W, X, Y, etc. channels in the Ogg stream), as it
is the easiest to decode (if we used another format like UHJ, the player
would have to decode it to B-format before it could do more with it), is the
oldest (it's been used since the early '70s, so any patents should have expired
YEARS ago), and it's the most flexible. What I would propose would be a field
for each track that is marked Ambisonic, of one byte. Of this byte, the first
three bits would indicate which order of Ambisonics were being used, and the
last five bits would indicate which signal in the hierarchy this stream
represented. For example: (These are big-endian)
Component Order Signal
W 000 00000
X 001 00000
Y 001 00001
Z 001 00010
R 010 00000
One additional format which may become important as time goes by is G-format,
which is basically an Ambisonic signal pre-decoded for a standard DVD-type
5.1 speaker array. Once this begins to be used, we may want to incorporate
a G-format-to-B-format converter into the encoder so we're again working with
Even though no one anywhere is talking about using 7th-order Ambisonic signals,
this is a good way of future-proofing the format, should any such system be
developed. (If it was easier to implement, each of these fields could be a
Once we have an Ambisonically-encoded Ogg stream, how do we play it? There
will be two ways. The first way is the way Ambisonic material has
traditionally been played -- the B-format signals are outputted directly, and
fed to an outboard decoder module. This offers maximum flexibility, as this
decoder could be a 128-speaker auditorium model. The drawbacks, however,
would be showstoppers if this were the only playback method. (The biggest
drawback is the need for everyone to have a decoder -- they aren't very common,
so they're fairly expensive.)
The second way is the way I propose that most of us will play this -- software
decoding. The B-format components will be rematrixed in software into a
certain number of speaker feeds, which will then be outputted to your amplifier
and speaker array. You would create a configuration file (either manually or
with some sort of GUI) which would tell the decoder about your speaker array
(location, possibly frequency range as well), and then the decoder would make
the appropriate adjustments to the signals to reproduce the sound field as
accurately as possible.
At first, I was thinking that this could be done using the four-channel output
of a Sound Blaster Live (or similar card), feeding the 5.1 channel analog input
of a home theater receiver. If sound cards are developed which integrate AC-3
encoding (allowing you a virtual 5.1 outputs which are encoded by a chip into
an AC-3 or DTS stream), this could also be used. (We probably wouldn't be able
to integrate AC-3 encoding directly into the player until the patent on it
expires, which won't be for quite a while. I doubt Dolby would give us the
free license to it we would need to do so -- especially since we would be
competing with Dolby for surround mindshare!) Hopefully, we would also be
able to drive two SBLive cards in parallel eventually, giving us 8-channels
of output, but that remains to be seen. (The driver doesn't even support
4-channel output yet...)
For now, however, independant of how fast the software decoding side of this
develops, the format can at least be defined to create a framework for future
work. One thing I've been keeping in mind during this process is that we
don't necessarily have to be using the Vorbis codec for this -- once the Squish
lossless codec is available, that could be used for any or all components of an
At this point, several things need to happen.
a) The format need to be defined, preferably in such a way that it is
as flexible as possible, and allows non-surround-capable decoders
to decode the W-Y components as M/S stereo.
b) I'm going to ask Richard Furse, who has already written a software
Ambisonic encoder and decoder, if he will relicense these tools
to us under the LGPL. If he agrees, this will make many things
much easier and faster.
c) The patent situation of Ambisonics in general, and particularly
newer Ambisonic developments (G-format pre-decoded 5.1, 2nd- and
higher-order Ambisonics in general, etc.) needs to be examined to
ensure that we know of any patent pitfalls so they can be worked
d) Work towards 4-channel output on the SBLive (and maybe Aureal
Vortex, if they ever open their specs) drivers needs to proceed.
e) Someone should look into how easy or hard it will be to output
four channels under Windows -- we'll want to be able to for a
future Winamp plugin.
f) Once b) happens, or we write our own equivalents, functions will
need to be either added to libvorbis, or possibly a standalone
library, to work with Ambisonic material. (On the encoding side,
UHJ-encoded material, such as all the CDs put out by Nimbus in
England, will be decoded into B-format before it is encoded. A
G-format-to-B-format decoder may also be useful, if DVD-Audio
discs begin to be released in G-format. On the decoding side,
we will probably want to put the software Ambisonic decoding in
its own library to reduce bloat in the main plugin. The main
plugin could output raw B-format without this extra plugin, but
would output stereo by default.)
I realize that this is a LOT to digest at once, but hopefully this will
stimulate discussion about the future of surround support in the Ogg format,
and get things off and running.
I'm really excited about the whole Ogg project, and Vorbis in particular, and
I think we can all look forward to an exciting future ahead... Watch out
Dolby and DTS, here we come!
David Carter ** dcarter at sigfs.org ** dcarter at visi.com
PGP Key 581CBE61: E07EE199C767C752 8A8B1A9F015BF2EA
Key available by finger or www.keyserver.net
The recognition of individual possibility,
to allow each to be what she and he can be,
rests inherently upon the availability of knowledge;
The perpetuation of ignorance is the beginning of slavery.
from "Die Gedanken Sind Frei": Free Software and the Struggle for Free Thought
by Eben Moglen, General council of the Free Software Foundation
More information about the ogg-dev