[theora-dev] Ogg mux design
giles at xiph.org
Mon Feb 16 16:25:43 PST 2004
Thanks for writing up your thoughts on the mux design
(ogg/doc/ogg-multiplex.html in cvs) Now that there's something to argue
with, I'd like to comment.
To recap, the documented proposal is that we make two categories of
streams within the OggFile multiplexing library. Pages are sorted
chronologically by the timestamp equivalent of their granulepos fields.
Normal data like audio and video are thought of as 'continuous' and use
the current spec of setting the page granulepos to correspond to the
timestamp of the last data element decodable in that page. Other, sparse
data like captions are thought of as 'discontinous' and use a different
convention: the granulpos corresponds to the timestamp of the *first*
data element decodable in that page. These rules arrange the stream so
that the discontinous pages "fall out" of the stream in time to be
useful without necessiting excessive buffering of higher-bitrate data
as would be the case if they were also sorted by end time.
What Jack and I came up with originally was slightly different. All
pages are marked by end time and there is no distinction between stream
types. Pages are still sorted chronologically. The difference is that
*all* pages are sorted by their start time, which is to say by the
granulepos of the previous page in that logical bitstream.
I believe this handles the requirements with about equal effectiveness,
and I like it better conceptually, else I wouldn't bring it up. By
treating all the streams the same we have less to think about (though
seeking gets even more complicated). The mux logic in OggFile is
probably equivilent code size, but there's one less flag to pass across
the interface and it looks simpler from the point of view of the codec
glue, which is the part more people have to deal with. Also, we don't
have to change any part of Ogg that's already been published.
Discontinuous streams are sorted exactly the same; what's treated
differently are the continuous streams. In the case of an a/v stream,
the larger video packets will come before the smaller but more numerous
audio packets that need to be played in the same timespan. You have to
buffer both during that frame anyway; which wants a longer latency is
really something only the playback app knows, so this doesn't make much
difference in terms of buffering.
Am I missing anything?
Finally, I'm not convinced the continuous/discontinuous classification
is all that helpful in the end. The real issue is that we're
multiplexing time-linear streams with wildly-different bitrates. That's
the real difference between a caption and a video bitstream, and the
issue we're trying to solve is not having to buffer the high-bitrate
stream while waiting for a page from the low-bitrate one. It's really a
matter of degree. The issue's obvious with captions because the bitrate
really can be tiny on the multimedia scale, but captions vs 'web video'
bitrate ratio isn't all that different from the audio vs video tracks in
--- >8 ----
List archives: http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'theora-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body. No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.
More information about the Theora-dev