[vorbis] full circle on the text stream format

Ralph Giles giles at snow.ashlu.bc.ca
Mon Aug 7 14:23:09 PDT 2000

I've come full circle on the Ogg text stream format, back to arbitrary
associated xml streams. What's changed? Mostly rolling the requirements
around in my head, and weighting them differently.

I think we've dealt with the complexity issue; this functionality can be
optional for "dumb" audio players, and possibly also for dumb video
players, though there it's a smaller fraction of the code. (and I think
monty may disagree.) On top of this, I believe those who've assured me
that a DTD-specific parser (say for synchronized lyrics) can be very
small, at least enough to test in implementation.

The whole point all along was flexiblity and extensibility, which I think
xml handles nicely. So if we want to add, say, stylesheets or (animated)
vector graphics later, we can do so. There's a lot of momentum in xml 
which will help longevity, imo.

This also means we can treat the RDF stream-description metadata on equal
footing with synchronized lyrics or anything else that can be expressed as
text. That would have been less elegant with the doclet approach. And in
either case, I now think it would be important to be able to re-assemble
the original full-length document, so going the small, self-contained
route is really only shifting the complexity elsewhere. To my mind that
levels the technical differences between the proposals, so there's no
reason not to decide on aesthetics. :-)

The last thing I was stuck on was how to deal with dropped packets. XML
was designed for integrity verification, not graceful degredation. I now
think my original proposal of breaking the document into packets so that
the loss of any but the "head" or "end" still results in valid xml on the
receiving end will in fact be acceptable.

Michael had me convinced for a while, but I think this won't be as
important as before:

* for many uses, the text stream will fit on one Ogg page anyway. Thus the
  practical problem is more "all or nothing" transmission than lack of
  graceful degredation. synchronized lyrics are definitely in this category.

* this does require that the encoder be smart about the packetization.
  However, I fully intend that we develop a small number of "blessed" DTDs
  as part of a given mapping, and those can be designed with packetization
  in mind and the break criteria hardcoded. I also suspect some simple
  heuristics would work pretty well in terms of mapping a general document
  tree onto the head-body-body-...-body-end structure.

So, what's next? I'm ready to move on to a trial implementation, but was
having trouble remembering all the issues people had brought up, so if you
think I'm missing something, please say so.

Otherwise, we need some specific DTDs to try out, and a set of elements
for the RDF metadata. I didn't like Robert's recent suggestion very much,
so I'll go work on a counter-proposal more representative of my vision.

Then I'd like to see a general way of handling timestamps for display
(should these be empty tags, containers, or both?) for display
synchronization, and develop some example DTDs for that. I'm interested in
subtitles/transcripts and by extension scrolling lyrics, but it would be
good to see proposals from other quarters. My basic suggestion would be
only that we share a tag subset for the timestamping, and that each type
of data be encoded as a separate substream.


giles at ashlu.bc.ca

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.

More information about the Vorbis mailing list