[vorbis] xml stream formats

Sun Jun 18 01:25:23 PDT 2000

Speaking of Metadata, how's work going on the definition?

Looking back at the list archives, there seems to be a semi-plan to use
Robert Kay's DTD from http://www.cdindex.org/dtd/TrackInfo.dtd, but I'm
conserned that it's too specialized for video and we'll end up having a
special case for audio-only files.

There are a couple of general issues here. Micheal Smith suggested on irc
that an important distinction to make is between "timecoded" data like
audio, video, and scrolling lyrics, and "timeless" data like the
production notes, or the fact that logical bitstream 12 is the
pop-up-video overlay in bengalese.

My proposal was that each type or instance of timecoded data be
encapsulated in its own logical stream. For a song, that might mean one
vorbis-encoded audio track, an xml track with the lyrics, and another xml
track with the phrasing, keychanges, and other musical markup. For a
video, it might be the a video track, three vorbis-encoded audio tracks in
different languages, 3 subtitle overlays, and 5 xml streams duplicating
the subtitles with two additional translations. 

I'd hope we could make a single dtd for the timecoded xml streams, relying
on conventional attributes to generalize the markup; karaoke and musical
annotation of a raga both have very specialized needs. something like:

<event timestamp="532739" type="chord">E7m</event>

or

<event timestamp="1462" speaker="Robin">No Julia, I don't want to go to
the prom with you.</event>

I guess that doesn't enforce segragation of content. Hmm. Well, we'd want
the player to key on something in the header, not on a quick scan of the
contents. My point was I wanted to avoid having a tag for every requested
markup (the kitchen-sink, in other words) and focus on a generalized
presentation of text synchronized to the media being played.

This actually moves a lot of the data out of the "kitchen sink" xml
document. We may still want some metadata to tell the player about each of
those streams, though I think it might be possible to get away with just
what's in the comment fields in the individual streams. That just leaves
the static data, like the lyrics if I'm too lazy to time-index them. 

As has been mentioned, it would be nice if we could share as much of the
"metadata" format as possible between ogg (as a file format), icecast (as
a network streaming application), and the cdindex. How difficult would it
be for the cdindex project to use separate records for the timecoded
xml data rather than embedding them in the TrackInfo record?

We also talked about streaming issues a bit. Jack explained that icecast
is going to (does?) insert the three pages of the vorbis header when a new
client connects in the middle of a song. This is necessary to set up the
decoder, but we get the comment page more-or-less for free. Something
similar would have to be done with the timecoded xml streams, since
well-formed xml has a header, and there will probably be a small amount of
metadata associated with each stream: language, who translated it,
revision, and so on. Finally, it might make sense to insert the static
metadata "out of band" like this, again so it's available even if the
player connects in the middle of a file.

The multiple-streams idea does help the streaming server by making it
easier to split out the parts of interest. A client could ask only for the
video and the director's commentary xml, for instance.

Thanks for reading so far. I think it's important to work this out soon so
we don't end up limiting ogg to a mostly-audio format.

Cheers,
 -ralph

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.