[vorbis-dev] True surround sound for Ogg -- a proposal (fwd)

Tue Jul 11 04:28:39 PDT 2000

Date: Mon, 10 Jul 2000 14:51:12 +0100 (BST)
From: DG Malham <dgm2 at york.ac.uk>
To: vorbis-dev at xiph.org
Cc: DG Malham <dgm2 at york.ac.uk>, Rob Fletcher <rpf1 at york.ac.uk>
Subject: Re: [vorbis-dev] True surround sound for Ogg -- a proposal (fwd)
In-Reply-To: <Pine.SGI.3.95L.1000710092216.9043693B-100000 at turpin.york.ac.uk>
Message-ID: <Pine.SOL.3.95L.1000710144201.18737A-100000 at mailer.york.ac.uk>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII

Hi,
        Rob Fletcher forwarded this to me as one of the researchers most
involved in Ambisonics currently. I am, at this moment, developing a set
of new subcommands for the Snack audio extension to Tcl/Tk which handle
Ambisonic panning, soundfield manipulation, decoding and mixing. The
panning one is almost finished and the others should follow soon after. It
has proved unbelievable easy to write new stuff for this package - less
than two weeks from starting to look at how to do it to having most
functions working (and that's despite doing other things and being
horribly rusty on the C programming side). This will be open source C code
and I would be very happy for it to be incorporated in Ogg. 

Dave Malham. 

/**************************************************************************/
/* Dave Malham            "http://www.york.ac.uk/depts/music/dgm.htm"     */
/* Music Technology Group "http://www.york.ac.uk/inst/mustech/welcome.htm"*/
/* Department of Music    "http://www.york.ac.uk/depts/music/welcome.htm" */
/* The University of York  Phone 01904 432448                             */
/* Heslington              Fax   01904 432450                             */
/* York YO1 5DD                                                           */
/* UK                     'Ambisonics - Component Imaging for Audio'      */
/*          "http://www.york.ac.uk/inst/mustech/3d_audio/ambison.htm"     */
/**************************************************************************/

> 
> ---------- Forwarded message ----------
> Date: Fri, 7 Jul 2000 11:18:03 -0500
> From: David Carter <dcarter at sigfs.org>
> Reply-To: vorbis-dev at xiph.org
> To: vorbis-dev at xiph.org
> Subject: [vorbis-dev] True surround sound for Ogg -- a proposal
> 
> Hi everyone,
> 
> Over the last two weeks or so, I've been thinking about how to add surround
> sound to Ogg -- and more than that, to do it in the best way possible.  With
> this in mind, I started considering using Ambisonic surround sound.  The
> advantages of this format are considerable:
> 
> 	a) It was developed in the early to mid '70s, so the patents should
> 	   be expired by now.
> 
> 	b) It's scalable -- it can handle everything from mono, to stereo,
> 	   full horizontal surround, or full 360-degree spherical surround
> 	   (periphonic).
> 
> 	c) It's based on a sound mathematical foundation (unlike Dolby's
> 	   5.1 system), allowing you to calculate how it should be able
> 	   to perform.
> 
> 	d) Depending on how many channels you are willing to use (and the
> 	   source of your material) you can make the positioning of sound
> 	   in the sound field as accurate as you want.  (First-order
> 	   Ambisonics requires four channels for full periphonic surround, 
> 	   while second-order Ambisonics provide a larger 'sweet spot' with
> 	   eight channels.)  Since an Ogg stream can have up to 255 channels,
> 	   the format itself will not be a limitation.
> 
> 	e) It's compatible with existing decoder technology -- if your player
> 	   does not support surround decoding, two of the channels can be
> 	   decoded as M/S stereo.  (Once patent issues regarding M/S stereo
> 	   in digital audio encoding are worked out, that is...  Monty thinks
> 	   that M/S stereo may have been patented by Fraunhofer for audio
> 	   compression uses.)
> 
> Many articles further describing the format itself are available at
> www.ambisonic.net, but I'll try to give a brief overview here for those of
> you who are unfamiliar with it.
> 
> Ambisonics is based on the principle of encoding spherical harmonic components
> of a sound field into one or more audio channels, which allow reproduction of
> that sound field later on with an array of speakers in a certain configuration.
> The spherical harmonics it reproduces are similar in shape to the various
> electron orbitals in an atom, if any of you are familiar with that.  The first
> component, the 'W' signal, reproduces the pressure of the signal.  (This would
> be 0th-order Ambisonics, and gives you mono.  This takes the shape of the 'S'
> orbitals in an atom.)  The next three signals are the directional components
> of the signal, which give you the 2- or 3-D spatial information.  These are
> the X, Y, and Z signals.  (These make up 1st-order Ambisonics, and have the
> shape of the 'P' orbitals in an atom.  If you decode only the W and Y signals
> through an M/S decoder, you have stereo.  If you decode WXY, you have
> horizontal surround.  If you decode all four, you have periphonic surround.
> Anyway, you get the picture.)  The next four signals describe the curvature
> of the signal (I think -- I don't understand this realm terribly well yet),
> and have the shape of the 'D' orbitals of an atom.  These make up the signals
> of 2nd-order Ambisonics, which will, if used, increase the accuracy of the
> reproduced sound field considerably, and widen the 'sweet spot' by a fair
> amount.  Higher-order Ambisonics are possible, but little research has been
> done to this date.  All existing Ambisonic recordings of actual events are
> in first-order Ambisonic formats.
> 
> Now that I've tried to describe what the Ambisonic system is (hopefully I
> didn't confuse you too much -- read some of the articles on the site I gave
> earlier for more in-depth information), I'll describe more specifically what
> I have in mind for the Ogg project.
> 
> First of all, though there are several formats currently in use for
> distributing Ambisonic material (including one -- UHJ -- which is stereo
> compatible) I propose that we use Ambisonic B-format (using the harmonics
> directly -- we would have W, X, Y, etc. channels in the Ogg stream), as it
> is the easiest to decode (if we used another format like UHJ, the player
> would have to decode it to B-format before it could do more with it), is the
> oldest (it's been used since the early '70s, so any patents should have expired
> YEARS ago), and it's the most flexible.  What I would propose would be a field
> for each track that is marked Ambisonic, of one byte.  Of this byte, the first
> three bits would indicate which order of Ambisonics were being used, and the
> last five bits would indicate which signal in the hierarchy this stream
> represented.  For example:  (These are big-endian)
> 
> 	Component	Order	Signal
> 	W		000	00000
> 	X		001	00000
> 	Y		001	00001
> 	Z		001	00010
> 	R		010	00000
> 	etc.
> 
> One additional format which may become important as time goes by is G-format,
> which is basically an Ambisonic signal pre-decoded for a standard DVD-type
> 5.1 speaker array.  Once this begins to be used, we may want to incorporate
> a G-format-to-B-format converter into the encoder so we're again working with
> B-format signals.
> 
> Even though no one anywhere is talking about using 7th-order Ambisonic signals,
> this is a good way of future-proofing the format, should any such system be
> developed.  (If it was easier to implement, each of these fields could be a
> whole byte.)
> 
> Once we have an Ambisonically-encoded Ogg stream, how do we play it?  There
> will be two ways.  The first way is the way Ambisonic material has
> traditionally been played -- the B-format signals are outputted directly, and
> fed to an outboard decoder module.  This offers maximum flexibility, as this
> decoder could be a 128-speaker auditorium model.  The drawbacks, however,
> would be showstoppers if this were the only playback method.  (The biggest
> drawback is the need for everyone to have a decoder -- they aren't very common,
> so they're fairly expensive.)
> 
> The second way is the way I propose that most of us will play this -- software
> decoding.  The B-format components will be rematrixed in software into a
> certain number of speaker feeds, which will then be outputted to your amplifier
> and speaker array.  You would create a configuration file (either manually or
> with some sort of GUI) which would tell the decoder about your speaker array
> (location, possibly frequency range as well), and then the decoder would make
> the appropriate adjustments to the signals to reproduce the sound field as
> accurately as possible.
> 
> At first, I was thinking that this could be done using the four-channel output
> of a Sound Blaster Live (or similar card), feeding the 5.1 channel analog input
> of a home theater receiver.  If sound cards are developed which integrate AC-3
> encoding (allowing you a virtual 5.1 outputs which are encoded by a chip into
> an AC-3 or DTS stream), this could also be used.  (We probably wouldn't be able
> to integrate AC-3 encoding directly into the player until the patent on it
> expires, which won't be for quite a while.  I doubt Dolby would give us the
> free license to it we would need to do so -- especially since we would be
> competing with Dolby for surround mindshare!)  Hopefully, we would also be
> able to drive two SBLive cards in parallel eventually, giving us 8-channels
> of output, but that remains to be seen.  (The driver doesn't even support
> 4-channel output yet...)
> 
> For now, however, independant of how fast the software decoding side of this
> develops, the format can at least be defined to create a framework for future
> work.  One thing I've been keeping in mind during this process is that we
> don't necessarily have to be using the Vorbis codec for this -- once the Squish
> lossless codec is available, that could be used for any or all components of an
> Ambisonic signal.
> 
> At this point, several things need to happen.
> 
> 	a) The format need to be defined, preferably in such a way that it is
> 	   as flexible as possible, and allows non-surround-capable decoders
> 	   to decode the W-Y components as M/S stereo.
> 
> 	b) I'm going to ask Richard Furse, who has already written a software
> 	   Ambisonic encoder and decoder, if he will relicense these tools
> 	   to us under the LGPL.  If he agrees, this will make many things
> 	   much easier and faster.
> 
> 	c) The patent situation of Ambisonics in general, and particularly
> 	   newer Ambisonic developments (G-format pre-decoded 5.1, 2nd- and
> 	   higher-order Ambisonics in general, etc.) needs to be examined to
> 	   ensure that we know of any patent pitfalls so they can be worked
> 	   around.
> 
> 	d) Work towards 4-channel output on the SBLive (and maybe Aureal
> 	   Vortex, if they ever open their specs) drivers needs to proceed.
> 
> 	e) Someone should look into how easy or hard it will be to output
> 	   four channels under Windows -- we'll want to be able to for a
> 	   future Winamp plugin.
> 
> 	f) Once b) happens, or we write our own equivalents, functions will
> 	   need to be either added to libvorbis, or possibly a standalone
> 	   library, to work with Ambisonic material.  (On the encoding side,
> 	   UHJ-encoded material, such as all the CDs put out by Nimbus in
> 	   England, will be decoded into B-format before it is encoded.  A
> 	   G-format-to-B-format decoder may also be useful, if DVD-Audio
> 	   discs begin to be released in G-format.  On the decoding side,
> 	   we will probably want to put the software Ambisonic decoding in
> 	   its own library to reduce bloat in the main plugin.  The main
> 	   plugin could output raw B-format without this extra plugin, but
> 	   would output stereo by default.)
> 
> I realize that this is a LOT to digest at once, but hopefully this will
> stimulate discussion about the future of surround support in the Ogg format,
> and get things off and running.
> 
> I'm really excited about the whole Ogg project, and Vorbis in particular, and
> I think we can all look forward to an exciting future ahead...  Watch out
> Dolby and DTS, here we come!
> 
> 	David
> 
> -- 
> David Carter ** dcarter at sigfs.org ** dcarter at visi.com
> PGP Key 581CBE61: E07EE199C767C752 8A8B1A9F015BF2EA
> Key available by finger or www.keyserver.net
> 
> 
> --- >8 ----
> List archives:  http://www.xiph.org/archives/
> Ogg project homepage: http://www.xiph.org/ogg/
> 
> 

------- End of Forwarded Message

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/