[vorbis] test stream update

Ralph Giles giles at snow.ashlu.bc.ca
Wed Jul 26 02:25:45 PDT 2000



Don't have too much to report, but I talked to Michael Smith about things
on IRC this evening and we made some progress in compromising our visions.

On the static and stream identification metadata there was a tenative
decision to go with RDF, but in such as way that a limited player could
choose not to support it and still be able to play the a/v data. Robert
mentioned trying to work his Trackinfo DTD into a set of attributes, but I
haven't heard if he's gotten to it. Again, I would push to unify music,
film, and video here.

One thing I don't know how to treat is liner notes in multiple languages.
It's not uncommon for classical music to be distributed with extensive
liner notes in multiple languages, otherwise identical. In the case of
lyrics/subtitles we should create separate streams for each translation,
but do we want to do that for the static metadata? Does RDF already have a
way to handle this?

Most of our discussion was about the timecoded metadata. The proverbial
scrolling lyrics, but this must also serve for subtitles (in multiple
languanges), transcripts, commentary, headlines, guitar tabulature,
and so on. Our most important concern to maintain maximum flexibility.

As I've said before, I think it's important to have at least some kind of
inline text markup. Raw text, even with unicode, is too limited. So we
waved the magic xml buzzword at the problem.

What I wanted to do was allow arbitrary xml streams in ogg, but specify
specific dtds for compliance with a particular mapping, if that's the
right term. Players would ignore unknown dtds, or optionally try to do
something intelligent with them (like feed them to a browser).

The problem with this is that there's no way to resume parsing of an xml
document after a dropped packet. In fact, you're suppose to stop and error
out if you encounter an inconsistency. The audio and video codecs
(including mng) all have a way to restart decoding periodically, so this
is a bothersome lack of parity.

What's new: our compromise was to divide the timecoded xml into "doclets",
small packets with all the desired markup but as a self-contained xml
document complete with it's own header. Neither of us likes it very much,
but so far it's the best compromise that fits the requirements we've
established. Criticism please. :)

My suggestion was to model the doclets on the vorbis comment header,
encoding each as a bytevector, with the content being the xml doclet. For
maximum flexibility, we would add start and stop display timestamps
externally encoded in addition to the content. There would still be
internal timestamps, but these would mark the boundaries for the whole
packet and help with seeking. It also means the content doesn't really
have to be xml at all, leaving lots of headroom for future extension.

Internal timestamps are important because for things like karaoke you need
to mark each word separately, which makes for something >200% overhead
with just one word per vector.

You can also do things like change the character set encoding mid-stream,
should that ever make sense. Michael's example was mixed English and
Klingon. (The Unicode Consortium has so far studiously ignored the Klingon
encoding proposal.)

I also think it would be possible to implement some of these features
implicitly by always breaking xml document into ogg packets at the same
level of the document heirarchy. One can also imagine various hacks to
include a psuedo-header, prehaps as a comment, to tell the parser where it
is.

The disadvantanges: It feels complicated to me, introducing another layer
between Ogg and the data, but Michael feels this way about my continuous-
xml proposal. :-) It's very difficult to do document structure this way,
at least without something equivalent to the periodic-pseudoheader hacks. 
I can live with that, we'll probably just get non-container headings like
html. It's also more work to display the timecoded stream in a static
format (I just want to read the lyrics, not listen to the song!)

I guess at this point what we need are some concrete proposals, both for
the encoding spec and DTDs, so we can work out the details and experiment
practically.

Cheers,
 -ralph


--
giles at ashlu.bc.ca
Subtle mind control? Why do all these HTML buttons say 'Submit'?

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.




More information about the Vorbis mailing list