[ogg-dev] Ambisonics in Ogg Vorbis

Sun Apr 15 05:47:32 PDT 2007

Martin Leese wrote:
> On 2/28/07, Ivo Emanuel Gonçalves <justivo at gmail.com> wrote:
> 
>> On 2/28/07, Ralph Giles <giles at xiph.org> wrote:
>> > Well, there are todo pages at wiki.xiph.org, but I meant more in the
>> > community folklore sense. My point is a roadmap doesn't help much 
>> unless
>> > there are people committed to making things happen. That's been the
>> > problem with a lot of this stuff, and why it's been so nice to see the
>> > ambisonics work happening.
>>
>> The situation on Ambisonics is tricky, because it depends on someone
>> coding a whole API for the different Xiph projects AND Monty being
>> available to apply whatever changes are need in Vorbis.
> 
> I have been giving some thought to how to
> include Ambisonics in Ogg Vorbis.  There is a
> question at the end, so please plough on.
> 
> As I understand it, all that is needed is some
> machine parseable metadata to identify the
> audio data as being Ambsionics.  The channel
> coupling wont be optimal and the phase may
> get a bit munged (Ambisonics is big on
> low-frequency phase), but it will work.  And the
> missing bits can then be worked on in Ghost
> at peoples' leisure.
> 
> Now, Vorbis comments aren't intended for
> machine parseable metadata, so the metadata
> will need to go in the Ogg container as a
> separate (chained) stream.  This scheme will
> not only work for Ogg Vorbis, but for Ogg
> <anything>.  There currently isn't a standard
> for a metadata stream to go into Ogg, but
> there is a draft standard at:
> http://wiki.xiph.org/index.php/Metadata
> 
> According to this draft standard, all I need to
> do is to invent some XML which includes the
> required information, and we are away.
> 
> Now for the question; how much did I get wrong?
> 

It depends what your aim is.  The mapping type
in the vorbis setup header is meant for
this[1],[2].  Of course a nonzero mapping type will
cause a lot of players to give up, but so will
including the XML stream.  I believe this is how
is was intended multi-channel would be handled.

As you say using a separate metadata stream would
allow all codecs to use the same scheme, but the
codec would need to communicate this with the muxer
if it wanted to use knowledge of the mapping.
Vorbis and OggPCM have their own mapping information,
which also means that they can be put in containers
other than Ogg without losing the mapping.  (I think
FLAC does too.)

If you want to go the separate metadata route there's
the choice of metadata stream.  Skeleton[3],[4] is
already implemented in some places and typically
contains metadata relevant to stream decoding.  This
is mainly temporal information, but also, "allows for
attachment of message header fields given as name-
value pairs that contain some sort of protocol
messages about the logical bitstream, e.g. the screen
size for a video bitstream or the number of channels
for an audio bitstream."[5]

The metadata split that seems to be emerging is
decode related stuff goes in skeleton and other
metdata (e.g. indexing) goes into CMML/currently-
non-existent-XML-streams[6],[7].

Without knowing what you need the metadata to record
(I assume it can be fairly strictly defined?) I'd say
of the two metadata approaches going the Skeleton route
is the easier task here.  It avoids needing to parse XML
and Skeleton is more strictly defined as being in the
right place for decode steup.

[1]<http://www.xiph.org/vorbis/doc/Vorbis_I_spec.html#id2510452>
[2]<http://lists.xiph.org/pipermail/vorbis-dev/2007-February/018697.html>.
[3]<http://wiki.xiph.org/index.php/Ogg_Skeleton>
[4]<http://annodex.net/TR/draft-pfeiffer-annodex-02.html#anchor8>
[5]<http://annodex.net/TR/draft-pfeiffer-annodex-02.html#anchor6>
[6] And vorbiscomments for the basic TITLE,
     ARTIST, etc. stuff.
[7] This is probably because: a) work has been done on
     Skeleton, b) it's more obvious what decode related
     information is needed and how it should be used.

-- 
imalone