[ogg-dev] The use for an XML based metadata format

Tue Sep 11 08:28:50 PDT 2007

On 11/09/2007, Daniel Aleksandersen <aleksandersen+xiphlists at runbox.com> wrote:
> On Tuesday 11. September 2007 01:34:35 Ian Malone wrote:
> > Daniel Aleksandersen wrote:
> > > By the way, I have bee discussing Dublin Core ('DC') with the
> > > developers of the Atom 1.0 specification. It seams the reason they
> > > created atom:rights instead of using dc:rights were just about what I
> > > thought it was: They though DC was too loosely defined. Their own
> > > atom:rights element were designed to more clearly define what the
> > > element contained (escaped HTML, clear text, or whatever else).
> > >
> > > When it comes to other dc:elements the arguments were about the same:
> > > Could be more clearly defined what they contain and remove redundant
> > > attributes and children elements.
> >
> > (Sorry, should have replied to this at the same time as the last.)
> >
> > I'd be interested which ones. DC is a bit nebulous, but that gives
> > you tremendous freedom too. Atom on the other hand has a very
> > specific target for the things they describe (but they did take a
> > very pragmatic approach to their problem from what I understand,
> > which means they're probably good people to be talking to).
>
> Atom is a syndication format—like RSS—that carry short descriptions of
> content and links to the full content. I only referred to their work
> because I though it would be relevant.
>
> The Dublin Core Metadata Initiative are great for describing written
> resources such as books, web pages, and indeed it would have worked in the
> case of Atom as well. However it is no good when it comes to describing
> audio and videos. Mostly because you have no method of describing
> what 'role' people and organisations had in the production. Which is
> precisely why I added the poorly defined role attribute to the person and
> organisation elements.

DC has provision for qualifiers, there is a proposed 'agent-role'
<http://dublincore.org/usage/meetings/2002/05/Agent-roles.html>
which, last time I looked, used the MARC relator list:
<http://www.loc.gov/marc/sourcecode/relator/relatorlist.html>

Two things to notice:
1.  That is a massively long list.
2.  It doesn't appear to do what we want.

But it is there.  However, no such scheme can reasonably provide
support for one-of-roles such as 'Othello', this suggests that beyond
simple role-refinement there are a number of mini-metadata specs
required here.

You've said elsewhere in reference to describing all media types:
> There are drafts for including still images in Ogg streams? Surely they
> would have to be described as well. What camera was used and who made it?
> Who owns it? What do we see on this image?

One question is; do we try to spec everything right from the
start without a good idea of all the use cases or do we start
to nail things down and leave some flexibility?  If you want
to describe everything that could go into a media file I think
you benefit from something like DC to do a lot of the basics,
but there /are/ bits missing (the library card vs programme
issue).  I think roles /is/ the right place to start, and the
difficulty with the RDF model is roles refine relationships
and relationships really need to be standardised to be any
use.

How about this instead (made up element names):
<contributor>
<person>John Smith</person>
<role type="actor"  name="The Doctor" />
</contributor>

It's obviously possible to get into all kinds of contortions
about what properties something like "person" should have
and just how contributor and role &c. should fit around each
other, but I think this allows  cumulative, possibly unique,
refinement at the same time as standardisation.

I notice your description (probably intentionally) splits up
into three separate issues:
1.  Technical origin data.  This is different from the bitrate/
   dimensions issue.  In this case there /must/ already be
   a photographic metadata format in existence.
2.  Rights.  In a way this is the simplest of them all, since
   'no technical solutions for legal problems'.  Owner, license,
   date.  Responsibility/choice of the publisher to get them
   right, but has negligible effect on any legal situation if they're
   not comprehensive.
3.  What do we see.  Subject descriptive metadata is hard.
   This would be a synopsis in other contexts.  But most
   decent metadata formats allow for a free-form description
   of content; from memory all of Atom, DC and CMML do.
   Additionally things like FOAF for people are possible.

-- 
imalone