[vorbis] xml stream formats

Michael Smith msmith at labyrinth.net.au
Sun Jun 18 02:35:23 PDT 2000



>> Ouch. Reading this made me remember something else that I hadn't thought of
>> in this context previously - it is NOT possible to stream well-formed XML,
>> in general. By limiting yourself in certain ways, you can get away with
>> just sending the start of the 'file' (as you've suggested here), then
>> streaming - but then you have some subset of XML, rather than XML. 
>
>Depends on the dtd, doesn't it? But yes, you'd have to start streaming at
>a point outside any nesting. I'd been implicitely assuming we'd only break
>pages there anyway.

Well, we have the top-level element, of which everything else is a child -
but you COULD handle that out of band (again) as a sort of 'header' -
that's unavoidable, but other nesting we could avoid with an appropriate
DTD. This would be painful and hacky, though (in fact, we don't really want
to send that header data at all, but we have to to satisfy XML parsers -
and having our own not-quite-XML parser is a really bad idea)

My point (or one of them), however, was that if we were to avoid this
nesting, then we lose the advantages of XML. If we do this, then we're
basically doing "XML for the sake of XML" - and there's already too much of
that around.

>
>> Maybe we have to go back and think about this - is XML really what we need?
>> In fact, if we have seperate streams for most stuff, XML really isn't the
>> most suitable solution, since it's intrinsically tree-structured. If we
>> have seperate streams, isn't each one going to be basically just a sequence
>> of (whatever)? A lyrics stream might have a series of lines, each keyed to
>> a time, for example.
>
>I guess I like xml because it's a familar model, very easy to edit by
>hand, and very flexible. Didn't someone once say that the only reason
>everybody thinks xml is tree-based is because that's the DOM datastructure
>used in most of the parser libraries? I was thinking about a linear
>timeseries, not a (hierarchicly) structured document.

XML is tree-based. The DOM gives you a tree-based interface to it, true.
SAX, on the other hand, gives you an event-driven interface (tends to be
faster), which would be far better suited to a not-very-treelike XML DTD.
However, this doesn't change the fact that the structure is intrinsically a
tree. DOM is like that precisely BECAUSE it's the natural expression of the
information.

>
>I think if we disallow nesting of the <event> tags, we'll be fine. The
>question is, do we lose anything in that? The only occasions I can see it
>being useful are: karaoke, where you want to display a line at a time but
>highlight each word, in subtitles for an instructional language video for
>the same reason, and in the music markup, where you might want to nest
>beat marks inside chord changes inside "chorus" and "bridge" groupings.
>The traditional sgml way of dealing with this seems to be
>"<event1>foo<event2>bar buz</event2>baf</event1>" but I've always found
>that ugly.

No, I don't think we do, particularly. We do if we have a kitchen-sink
metadata stream, but if we have a lyrics stream, and a ...(etc.), then we
don't really lose anything important.

Precisely BECAUSE it seemed clear that you were thinking of a linear
timeseries of 'events' (or whatever) I was suggesting that XML isn't really
appropriate.

>
>I'm open to alternative proposals. Byte vectors (like in the vorbis
>comment field) each with a length and a presentation timestamp would be
>even less flexible.

Slightly less flexible, but not terribly. I'll go away and think about this.

Michael

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.



More information about the Vorbis mailing list