[ogg-dev] The use for an XML based metadata format

Ian Malone ibmalone at gmail.com
Mon Sep 10 16:24:43 PDT 2007

Silvia Pfeiffer wrote:
> What I've gotten out of this discussion so far:
> 1) we need to introduce a means in which to do captions; this could be
> done through adding a "caption" element to CMML, or in another
> time-continuous annotation format; so far I am not sure which would be
> the better way

There is also the OggWrit draft
<http://wiki.xiph.org/index.php/OggWrit>.  Certainly a content
description metadata format does not need to address this as there
are so many other places it would fit better.  As mentioned else-
thread it could well describe the relation of the captions to other
resources, as in 3) below.

> 2) we need a XML annotation format for audio - in particular for music
> - that is more structured than vorbiscomment (and this probably
> applies to video, too)

While the particular examples I pick tend to be for music because
those are the obvious ones I think the interesting applications
may be for non-musical resources; see the Metadata talk page where
someone asked (a long time ago) about Learning Object Metadata for
teaching resources: <http://wiki.xiph.org/index.php/Talk:Metadata>.

Embedding in Ogg is the simple bit; the only point of contention
being whether you use a magic number to label it as metadata or
just package XML and let the parser sort it out.  (With a bit
more experience under my belt I'm persuaded a magic number might
be worthwhile, otherwise there'll be someone who hardcodes their
app to expect XML to be a metadata stream).  The missing bit (and
the difficult one), is the format.  The best thing to do is to
nail down a set of XML or RDF that addresses the obvious needs
and allows inclusion of further namespaces/schemas by end
developers as needed (c.f. LOM).

But I do think music and the cast/ensemble problem might be a
nice starting point as this is something classical music fans
have been looking for for a while but has never been provided by
other formats.  Coupled with the fact that Ogg audio covers FLAC
too you may win some audiophiles back.  Trackbacks and store links
are also probably in the scope of 'relatively straightforward'.
That said, RDF makes my head hurt.  I spent a while looking at how
to do this with DC and friend-of-a-friend (FOAF) but nothing
really clicked for me; perhaps a simple and cheerful new namespace
to tie it together is what's called for.

Oh, I realise that we have XML expertise here in the form of
the Annodex/CMML group, but there's also the XPSF people who
seem to know a lot about the darker corners of XML and URIs.

> 3) we need a means to describe relationships between different logical
> bitstreams; we had a discussion about this years ago, but never got to
> a proper specification of this
> 4) we need a means to address logcial bitstreams by name; this should
> be an ID attribute to be added to skeleton

3 & 4 are separate points, but obviously 3) needs 4).  If
there's an obvious way to do the URN bit then 4) is the
most straightforward of the lot.

> These four things are all very different and separate things - number
> 2 may even need further structuring IMO. Yes, they interrelate and
> there should be means to address one from the other. But IMO they all
> need a different approach.

Yes.  To recap 1) is covered elsewhere, potentially several times
over.  2) is the big one, what's needed depends on the end use,
however I think if we have a good foundation people can add the
bits they need; most of my current ideas about use cases have
metadata produced /once/ by the content provider or media
management, who can potentially supply the interpretor too if
the basic tools are there.  3) How separate is this to 2)?  If
you view the metadata as a manifest for the physical stream then
it describes the collection of the logical streams and their
relations.  On the one hand we might have a concert recording
where the overall description for the stream would name the
artists, on the other a film where the musicians might be
relegated to the description for the soundtrack (depending how
fastidious the metadata supplier is).  Yes there are special
bits needed for things like captions and multi-track audio,
but these are just relationships to the whole. (Suggests
ensure 4) can address whole bitstream too.)

The model I've got in my head is a tree which describes
properties of the physical stream, where necessary (actually,
as part of that process) defining how the logical streams
relate to it.  The deficiency with that is the logical streams'
relationship to each other is only given via their relations
to the overall stream.  You can describe their individual
properties further down their branches.

N.B. in all I've written above, physical stream really should
be read as 'single-link or non-chained physical stream'.  I
believe it would make sense to expect each link to carry its
own metadata and bases to refer to links.  It may be necessary
for external references and bizarre corner cases to be able to
specify an id in a link within a chain.  Maybe add an overall
link id to 4), but they probably shouldn't share an addressing
mechanism with logical stream ids for obvious reasons.

Wondering if any of the above makes sense.


More information about the ogg-dev mailing list