[ogg-dev] Skeletal relations

Ralph Giles giles at xiph.org
Fri Feb 15 15:56:42 PST 2008


We have new drafts of CMML 4.0 as a text codec and ROE as an xml  
stream abstract, subsuming the authoring support in CMML 3.1 and  
earlier.

Another thing we talked about at LCA is a how to specify  
relationships between the various streams in Ogg so that a server,  
muxer or player can make intelligent decisions about the contained  
tracks. The general idea is to use the (http-style) Message Headers  
in the Skeleton track to describe each logical bitstream, but no one  
has ever written anything down. This is a proposal to get the ball  
rolling.

Requirements:

* Distinguish alternates based on language
* Distinguish among subtitles for translation,
   for hearing impaired
* Distinguish commentary tracks
* Distinguish overlays from alternates from primaries

= Self description =

The following message headers describe the corresponding track,  
metadata essentially.

Lang: <locale>

Machine parsable locale string describing the language the track is  
in. Used for example to choose the default audio track based on user  
preferences.

Role: <role-type>

Free form qualifier used to mark the category of the track content.  
We will define a basic set of role-types with standard meaning for  
machine interpretation. Example role-types:

   commentary (e.g. Director's commentary on a film)
   transcription (e.g. detailed record)
   interpretation (contains additional information)
   slides (visual aides accompanying another track)

Of these, commentary is the only one I'd really like to have. Some  
other ideas: logo, ticker, credits, translation. The last is  
effectively the default though. Logo or ticker might be useful to  
have a different default for whether they are overlaid or not.

Description: <string>

Human readable description of the track, intended for display in a  
user interface. This can be localized by appending '.<locale>' to  
Description.

We could also copy general metadata here, e.g. title, creator, date,  
location, license. That's perhaps more interesting in the fishead  
packet which describes the stream as a whole rather than the  
individual tracks.

Program: <string>

Arbitrary tag for distinguishing a group of tracks from an unrelated  
group they happen to be multiplexed with. For example, three separate  
programs might be sent over the same link multiplexed together, but  
only audio and video tracks with the same value for the Program  
message header should be played together. The default 'empty' program  
is a valid program, every fishbone without this message header marks  
the corresponding track as being in the default program.


= Relations =

The self-description allows us to prioritize tracks implicitly, based  
on user preferences for showing audio, video, text, or some  
combination, preferred languages and roles. But there remain areas of  
ambiguity, so we define a way to mark relationships with other  
streams. The value of each of these is an Ogg stream serial number.

Overlays: <serialno>

This track doesn't (necessarily) stand on its own but is meant to be  
laid on top of another track. This distinguishes, for example a MNG  
video (no Overlays) with MNG subtitles (Overlays: corresponding  
theora video). Another example might be a vocal audio track that can  
be mixed with a music-only karaoke track.

Substitutes: <serialno>

Indicates that a track is an alternate or substitute for another.

Translates: <serialno>

Indicates that a track is an alternate language or media version of  
another track.

Parallels: <serialno>

Indicates that a track should be played together with another,  
instead of being treated as alternates.

Of these, Overlays is the only one I'm really clear on the use case  
for. I think the others could be handled just as well by specifying  
heuristics: tracks of the same media type and role with different  
lang Translate each other. Tracks with the same media type, program,  
role that don't overlay another are Parallels.

Question: Is it better to specify multiple relations with a list of  
serial numbers, or with multiple message headers?

Thoughts?
  -r


More information about the ogg-dev mailing list