[theora-dev] Ogg mux design

Mon Feb 16 16:25:43 PST 2004

Monty,

Thanks for writing up your thoughts on the mux design 
(ogg/doc/ogg-multiplex.html in cvs) Now that there's something to argue 
with, I'd like to comment.

To recap, the documented proposal is that we make two categories of 
streams within the OggFile multiplexing library. Pages are sorted 
chronologically by the timestamp equivalent of their granulepos fields. 
Normal data like audio and video are thought of as 'continuous' and use 
the current spec of setting the page granulepos to correspond to the 
timestamp of the last data element decodable in that page. Other, sparse 
data like captions are thought of as 'discontinous' and use a different 
convention: the granulpos corresponds to the timestamp of the *first* 
data element decodable in that page. These rules arrange the stream so 
that the discontinous pages "fall out" of the stream in time to be 
useful without necessiting excessive buffering of higher-bitrate data 
as would be the case if they were also sorted by end time.

What Jack and I came up with originally was slightly different. All 
pages are marked by end time and there is no distinction between stream 
types. Pages are still sorted chronologically. The difference is that 
*all* pages are sorted by their start time, which is to say by the 
granulepos of the previous page in that logical bitstream.

I believe this handles the requirements with about equal effectiveness, 
and I like it better conceptually, else I wouldn't bring it up. By 
treating all the streams the same we have less to think about (though 
seeking gets even more complicated). The mux logic in OggFile is 
probably equivilent code size, but there's one less flag to pass across 
the interface and it looks simpler from the point of view of the codec 
glue, which is the part more people have to deal with. Also, we don't 
have to change any part of Ogg that's already been published.

Discontinuous streams are sorted exactly the same; what's treated 
differently are the continuous streams. In the case of an a/v stream, 
the larger video packets will come before the smaller but more numerous 
audio packets that need to be played in the same timespan. You have to 
buffer both during that frame anyway; which wants a longer latency is 
really something only the playback app knows, so this doesn't make much 
difference in terms of buffering.

Am I missing anything?

Finally, I'm not convinced the continuous/discontinuous classification 
is all that helpful in the end. The real issue is that we're 
multiplexing time-linear streams with wildly-different bitrates. That's 
the real difference between a caption and a video bitstream, and the 
issue we're trying to solve is not having to buffer the high-bitrate 
stream while waiting for a page from the low-bitrate one. It's really a 
matter of degree. The issue's obvious with captions because the bitrate 
really can be tiny on the multimedia scale, but captions vs 'web video' 
bitrate ratio isn't all that different from the audio vs video tracks in 
uncompressed hd.

 -r
--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'theora-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.