[xiph-cvs] r6639 - trunk/ogg/doc
xiphmont at xiph.org
xiphmont at xiph.org
Fri May 7 22:47:48 PDT 2004
Author: xiphmont
Date: 2004-05-08 01:47:45 -0400 (Sat, 08 May 2004)
New Revision: 6639
Modified:
trunk/ogg/doc/ogg-multiplex.html
Log:
Ongoing work; it occurred to me to not lose it.
<p><p>Modified: trunk/ogg/doc/ogg-multiplex.html
===================================================================
--- trunk/ogg/doc/ogg-multiplex.html 2004-05-08 05:47:25 UTC (rev 6638)
+++ trunk/ogg/doc/ogg-multiplex.html 2004-05-08 05:47:45 UTC (rev 6639)
@@ -6,48 +6,57 @@
Page Multiplexing and Ordering in a Physical Ogg Stream
</font></h1>
-Last update to this document: February 13, 2004</em><br>
+Last update to this document: May 7, 2004</em><br>
<p>
The low-level mechanisms of an Ogg stream (as described in the Ogg
Bitstream Overview) provide means for mixing multiple logical streams
and media types into a single linear-chronological stream. This
-document discusses the high-level arrangement and use of page
+document specifices the high-level arrangement and use of page
structure to multiplex multiple streams of mixed media type within a
physical Ogg stream.
<h2>Design Elements</h2>
-<h3>Chronological arrangement</h3>
+The design and arrangement of the Ogg container format is governed by
+several high-level design decisions that form the reasoning behind
+specific low-level design decisions.
-The Ogg bitstream is designed to provide data in a chronological
-(time-linear) fashion. This design is such that an application can
-encode and/or decode a full-featured bitstream in one pass with no
-seeking an minimal buffering. Seeking to provide optimized encoding
-(such as two-pass encoding) or interactive decoding (such as scrubbing
-or instant replay) is not disallowed or discouraged, however no
-bitstream feature must require nonlinear operation on the
-bitstream.<p>
+<h3>Linear media</h3>
-<i>As an example, this is why Ogg specifies bisection-based exact seeking
-rather than building an index; an index requires two-pass encoding and
-as such is not acceptible according to original design requirements.
-Even making an index optional then requires an application to support
-multiple methods (bisection search for a one-pass stream, indexing for
-a two-pass stream), which adds no additional functionality as
-bisection search delivers the same functionality for both stream
-types.</i><p>
+The Ogg bitstream is intended to encapsulate chronological,
+time-linear mixed media into a single delivery stream or file. The
+design is such that an application can always encode and/or decode a
+full-featured bitstream in one pass with no seeking an minimal
+buffering. Seeking to provide optimized encoding (such as two-pass
+encoding) or interactive decoding (such as scrubbing or instant
+replay) is not disallowed or discouraged, however no bitstream feature
+must require nonlinear operation on the bitstream.<p>
-<h4>Multiplexing</h4>
+<h3>Seeking</h3>
+Ogg is designed to use a bisection search to implement exact
+positional seeking rather than building an index; an index requires
+two-pass encoding and as such is not acceptible according to original
+design requirements. <p>
+
+<i>Even making an index optional then requires an
+application to support multiple methods (bisection search for a
+one-pass stream, indexing for a two-pass stream), which adds no
+additional functionality as bisection search delivers the same
+functionality for both stream types.</i><p>
+
+<h3>Multiplexing</h3>
+
Ogg bitstreams multiplex multiple logical streams into a single
physical stream at the page level. Each page contains an abstract
time stamp (the Granule Position) that represents an absolute time
landmark within the stream. After the pages representing stream
headers (all logical stream headers occur at the beginning of a
physical bitstream section before any logical stream data), logical
-stream data pages are arranged in order of chronological absolute time
-as specified by the granule position. <p>
+stream data pages are arranged in strict, monotonically increasing
+order of chronological absolute time as specified by the granule
+position. <p>
The only exception to arranging pages in strictly ascending time order
by granule position is those pages that do not set the granule
@@ -56,7 +65,7 @@
case are described later under 'Continuous and Discontinuous
Streams'.<p>
-<h4>Buffering</h4>
+<h3>Buffering</h3>
Ogg's multiplexing design minimizes extraneous buffering required to
maintain audio/video sync by arranging audio, video and other data in
@@ -70,9 +79,34 @@
discontinuous data arrives in time) and no more, resulting in optimum
buffer usage for free. Because all pages of all data types are
stamped with absolute timing information within the stream,
-inter-stream synchronization timing is always explicitly
-maintained.<p>
+inter-stream synchronization timing is always explicitly maintained
+without the need for explicitly declared buffer-ahead hinting.<p>
+<h3>Whole-stream navigation</h3>
+
+Ogg is designed sot hat the simplest navigation operations are one
+that best treat the physical Ogg stream as whole summary of its
+streams, rather than navigating each interleaved stream as a seperate
+entity. <p>
+
+Example: the simplest method of seeking to a desired
+position in a multiplexed (or unmultiplexed) Ogg stream is to
+bisection search by time position (as encoded in the granule
+position). <p>
+
+Example: A bitstream section may consist of three multiplexed streams
+of differing lenghts. The result of multiplexing these streams should
+be thought of as a single mixed stream with a length that happens to
+equal the longest of the three component streams. Although it is also
+possible to think of the multiplexed results as three concurrent
+streams of different lenghts and it is possible to recover the three
+original streams, it will also become obvious that once multiplexed,
+it isn't possible to find the internal lenghts of the component
+streams without a linear search of the whole bitstream section.
+However, it is possible to find the length of the whole bitstream
+section easily (in near-constant time per section) just as it is for a
+single-media unmultiplexed stream.<p>
+
<h2>Granule Position</h2>
<h3>Description</h3>
@@ -88,8 +122,15 @@
The granule position is governed by the following rules:
<ul>
-<li>Granule Position must always increase forward from page to page,
-be unset, or be zero for a header page.<br>
+<li>Granule Position must always increase forward or remain equal from
+page to page, be unset, or be zero for a header page. The absolute
+time to which any correct sequence of granule position maps must
+similarly always increase forward or remain equal. <i>(A codec may
+make use of data, such as a control sequence, that only affects codec
+working state without producing data and thus advancing granule
+position and time. Although the packet sequence number increases in
+this case, the granule position, and thus the time position, do
+not.)</i><br>
<li>Granule position may only be unset if there no packet defining a
time boundary on the page (that is, if no packet in a continuous
@@ -98,41 +139,21 @@
and Discontinuous streams).<br>
<li>A codec must be able to translate a given granule position value
-to a unique, exact absolute time value through direct calculation. A
-codec is not required to be able to translate an absolute time value
-into a unique granule position value.<br>
+to a unique, deterministic absolute time value through direct
+calculation. A codec is not required to be able to translate an
+absolute time value into a unique granule position value.<br>
<li>Codecs shall choose a granule position definition that allows that
codec means to seek as directly as possible to an immediately
decodable point, such as the bit-divided granule position encoding of
Theora allows the codec to seek efficiently to keyframes without using
-an index.
+an index. That is, additional information other than absolute time
+may be encoded into a granule position value so long as the granule
+position obeys the above points.
</ul>
-<h3>granule position, packets and pages</h3>
+<h4>Example: timestamp</h4>
-Although each packet of data in a logical stream theoretically has a
-unique granule position, only one granule position is encoded per
-page. It is possible to encode a logical stream such that each page
-contains only a single packet (so that granule positions are preserved
-for each packet), however a one-to-one packet/page mapping is not
-intended for the general case.<p>
-
-A granule position represents the instantaneous time location
-between two pages</em>. In a continuous stream, the granulepos
-represents the point in time immediately after the last data decoded
-from a page. In a discontinuous stream, it represents the point in
-time immediately before the first data decoded from the page.<p>
-
-Because Ogg functions at the page, not packet, level, this
-once-per-page time information provides Ogg with the finest-grained
-time information is can use. Ogg passes this granule positioning data
-to the codec (along with the packets extracted from a page); it is
-intended to be the responsibility of codecs to track timing
-information at granularities finer than a single page.<p>
-
-<h3>Example: timestamp</h3>
-
In general, a codec/stream type should choose the simplest granule
position encoding that addresses its requirements. The examples here
are by no means exhaustive of the possibilities within Ogg.<p>
@@ -143,7 +164,7 @@
days before beginning a new logical stream (to avoid the granule
position wrapping).<p>
-<h3>Example: framestamp</h3>
+<h4>Example: framestamp</h4>
A simple millisecond timestamp granule encoding might suit many stream
types, but a millisecond resolution is inappropriate to, eg, most
@@ -157,7 +178,7 @@
efficient. Position in time would simply be <tt>[granule_position] *
[samples_per_frame] / [samples_per_second]</tt>.
-<h3>Example: samplestamp (Vorbis)</h3>
+<h4>Example: samplestamp (Vorbis)</h4>
Frame counting is insufficient in codecs such as Vorbis where an audio
frame [packet] encodes a variable number of samples. In Vorbis's
@@ -166,7 +187,7 @@
a granule position is <tt>[granule_position] /
[samples_per_second]</tt>.
-<h3>Example: bit-divided framestamp (Theora)</h3>
+<h4>Example: bit-divided framestamp (Theora)</h4>
Some video codecs may be able to use the simple framestamp scheme for
granule position. However, most modern video codecs introduce at
@@ -199,10 +220,125 @@
+
+
Can seek quickly to any keyframe without index
Naieve seeking algorithm still availble; juyst lower performance
Bisection seeking used anyway
+<h3>granule position, packets and pages</h3>
+
+Although each packet of data in a logical stream theoretically has a
+specific granule position, only one granule position is encoded
+per page. It is possible to encode a logical stream such that each
+page contains only a single packet (so that granule positions are
+preserved for each packet), however a one-to-one packet/page mapping
+is not intended to be the general case.<p>
+
+Because Ogg functions at the page, not packet, level, this
+once-per-page time information provides Ogg with the finest-grained
+time information is can use. Ogg passes this granule positioning data
+to the codec (along with the packets extracted from a page); it is
+intended to be the responsibility of codecs to track timing
+information at granularities finer than a single page.<p>
+
+<h3>start-time and end-time positioning</h3>
+
+A granule position represents the instantaneous time location
+between two pages</em>. In an "end-time" encoded page, the granulepos
+represents the point in time immediately after the last data decoded
+from a page. In a "start-time" encoded page, it represents the point
+in time immediately before the first data decoded from the page.<p>
+
+Start-time or end-time positioning is flagged in bit 3 of byte 5 in the
+Ogg page header. A set bit indicates start-time positioning. Version 0
+Ogg streams are restricted to using end-time positioning; version 1 may
+use either or both start-time and end-time positioning. A single logical stream
+within the multiplexed physical Ogg version 1 stream may also mix
+start-time and end-time positioning.<p>
+
+ Start- and end-time do not affect multiplexing sort-order; pages are
+still sorted by the absolute time a given granulepos maps to
+regardless of whether that granulepos prepresents start- or
+end-time.<p>
+
+<h4>use of end-time positioning</h4>
+
+End-time positioning is most useful in unmultiplexed streams. It allows
+two useful features relatively more easily:
+<ol>
+<li>"short" beginning-of-stream and end-of-stream packets can be represented entirely using granulepos; the codec does not need to store auxiliary sizing information in the codec's data packets.<br>
+<li>Retrieving the exact end-time of a stream is the trivial operation of inspecting the granule posiiton of the last page.<br>
+</ol>
+
+However, end-time coding results in sightly less efficient buffering
+usage in a multiplexed stream.
+
+<h4>use of start-time positioning</h4>
+
+Multiplexed streams of start-time encoded pages yield optimal
+buffering behavior; it requires the minimum theoretical buffer space
+of any possible arrangement of pages. This is the primary benefit of
+start-time positioning.<p>
+
+The drawbacks of start-time positioning mirror the benefits attributed to
+end-time positioning. Namely:<p>
+
+<ol>
+<li>
+
+Codecs that generate short packets can no longer infer the presence of
+a short packet from granulepos context; the 'shortness' of the packet
+must be encoded in the packet itself. This drawback is minor, however
+it does mean that codecs like Vorbis (which relies on granpos context
+to detect sort packets) absolutely must use end-time positioning to
+handle short packets.<br>
+<li>
+Determining ending time position of a stream requires slightly more
+work than in an end-time encoded stream; the packets of the final
+stream page must be counted forward to find ending time.
+<br>
+</ol>
+
+Despite these minor drawbacks, the additional buffer efficiency of
+start-time positioning strongly recommends its use in both multiplexed
+and unmultiplexed streams. Use of end-time positioning should largely be
+treated as a legacy means of supporting codecs that use
+granulepos-context to determine short packets (such as Vorbis I).<p>
+
+<h4>mixed start-time and end-time positioning</h4>
+
+Mixed positioning may refer to either multiplexing two or more streams
+that use different time positionings, or using more than one time
+positioning within a logical stream. <p>
+
+Mixed positioning mostly affects only buffer efficiency; although
+end-time positioning is less efficient than start-time, mixed-time
+positioning will often be less efficient than both. The inefficiency is
+relative however; buffer efficiency can still be excellent in all
+three cases.<p>
+
+One possible use of mixed-time positioning is combine the benefits of
+end-time and start-time positioning, for example, use start-time positioning
+for all but the last page of a stream, which is then coded in end-time
+format. This way, a short packet can be flagged using granulepos
+context and the end-time position of the stream is immediately obvious
+from inspecting the last granule position.<p>
+
+[POINT OF DISCUSSION: the above suggestion looks like it may be worth
+considering as the suggested way of positioning the stream, thus doing
+away entirely with the need to 'count time forward through packets' on
+the last page of a start-time encoded stream to find final steam
+length. However, a truncated stream will be missing the end-time last
+page.
+
+1) We could say 'mixed time is the way to go' and just let a
+damaged/truncated stream suffer.
+
+2) We could say 'counting time forward through packets is just the way
+it has to be done' and do away with the possibility of mixed coding
+entirely]
+
<h2>Multiplex/Demultiplex Division of Labor</h2>
The Ogg multiplex/deultiplex layer provides mechanisms for encoding
@@ -258,904 +394,11 @@
initial stream header. The majority of codecs will always be
continuous (such as Vorbis) or discontinuous (such as Writ).
-<h3>continuous granule position</h3>
+<h2>Unsorted Discussion Points</h2>
-
-<h3>discontinuous granule position</h3>
-
-
flushes around keyframes? RFC suggestion: repaginating or building a
- stream this way is nice but not required
+stream this way is nice but not required
-<h2>Appendix A: discussion excerpts</h2>
-
-Developers at Xiph.Org have discussed the details of Ogg multiplexing
-on many occasions on Internet Relay Chat. The earliest conversations
-regarding discontinuous streams and granule ordering between Monty
-<xiphmont> and Jack Moffitt from 1999 weren't logged, but much
-of the same material is rehashed in the three excerpts below.<p>
-
-The primary purpose of these excerpts is to illuminate a number of
-subtle points through logged conversations. The cornerstones of the
-Ogg muxing specification were long set at this point, however the
-excerpts capture discussion of proposed innovations within the
-original specification and the reasoning behind each proposal as well
-as discussing long-decided details.<p>
-
-These excerpts have been edited from the original verbatim IRC log to
-remove off-topic chatter and correct occasional typos.<p>
-
-<h3>excerpt one</h3>
-
-This excerpt discusses:
-<ol>
-<li>video keyframe flagging via granule position bit-division technique.
-<li>Division of labor during seeking between codec and Ogg demuxer
-</ol>
-
-<pre>
-
-<mau> guys, how can we test seeking, etc? are changes needed in the
- ogg framework?
-<mau> like seeking to keyframes?
-<rillian> mau: nope, just player support
-<mau> ok, so what would be the strategy? seek to an arbitrary time,
- and wait for a keyframe?
-<mau> yeah, currently there is the hack in granulepos, right?
-<danx0r> I've heard about it -- some sort of bitfield division
-<danx0r> lower bits are frames after a key
-<xiphmont> you can seek to a given location. the hack in granpos
- gives you the number for every keyframe.
-<danx0r> keyframes increase by some set increment -- can someone confirm?
-<xiphmont> yes
-<rillian> xiphmont: I thought it wasn't necessarily fixed
-<mau> or is it up to the player?
-<xiphmont> it's fixed for a given stream section.
-<danx0r> so if you seek naively now, you'll get garbage until the next kf?
-<mau> I think it is up to the player to freeze the last known good image
-<mau> until a keyframe passes, much like windows media, etc
-<xiphmont> you know if you're not in sequence.
-<danx0r> the right thing is to go to the previous keyframe and parse up to
- your seek frame faster than realtime, but...
-<danx0r> for now, something like what WMP does should be fine
-<Mike> mau: or, if it's a smart player (and the data source allows it),
- to deliberately seek forwards to the next keyframe.
-<rillian> are you talking about the radix rather than the actual keyframe
- rate?
-<mau> mike: going forward is ok, but in wmp you can still read audio
- for example, until the next video keyframe, where video resumes
-<mau> it is also a good strategy, guess it depends on the player
-<xiphmont> rillian: the stream is set up to have a maximum keyframe spacing.
- Granpos is updated by a fixed amount at each keyframe. The
- granpos is not [necessarily] linearly increasing
-<Mike> true.
-<rillian> it's monotonic, but not (necessarily) linear
-<mau> xiphmont: so ideally the player would look at the granulepos and
- count how many frames since the last key, and seek back that many
- pages?
-<xiphmont> mau: Ogg seeking is all done as predicted bisection search.
-<xiphmont> look in vorbisfile to see code that does it.
-<derf> If one encodes in a frame how many frames it has been since a
- keyframe, couldn't you do the same thing?
-<derf> Without imposing a maximum keyframe spacing?
-<xiphmont> that data does not exist in an ogg header.
-<xiphmont> Ogg headers use absolute counters.
-<derf> I meant in the packet data, but I see what you're saying.
-<xiphmont> you get that out of the granpos hack anyway.
-<derf> You have to start decoding the packet to tell where to get the
- keyframe.
-<xiphmont> Seeking in an ogg stream does not look at packets.
-<rillian> (except you have to parse the header to do granulepos conversion)
-<xiphmont> yes.
-<xiphmont> although it may be sensible to change that.
-<derf> You already need at least a page worth of data to check the CRC
- on the ogg header to seek.
-<derf> It would seem reasonable to require a full packet instead, and
- pass this to the codec when asking where to seek next.
-<xiphmont> derf: a page does not necessarily give you a packet.
-<derf> xiphmont: I know.
-<derf> xiphmont: But, allowing the codec to look at the packet better
- supports embedding codecs which might not be able to determine
- the position of a keyframe from their granpos alone.
-<xiphmont> derf: why wouldn't they? Blind refusal to use the mechanisms at
- hand?
-<derf> The reason this concerns me is that the case where you want to
- have really long spaces between key frames (streaming) is also
- exactly the place where you want to allow very long streams.
-<xiphmont> you have a 64 bit granpos.
-<derf> And if I never want a keyframe except at the first frame, I now
- have only 32.
-<xiphmont> ...and you're welcome to use as many logical sections as you want.
-<xiphmont> so, now you have 96 bits.
-<derf> Okay. I guess I can live with a keyframe every 4 billion frames.
-<xiphmont> if you want unique serialnos; you're allowed to wrap them in
- streaming, so it becomes infinite.
-<xiphmont> if you're streaming with one keyframe every 4G, you'll have no
- viewers anyway :-)
-<derf> That's what out-of-band synch points are for.
-<xiphmont> sure, that works.
-<xiphmont> Now, it's possible to do a 'seek requests are handed to the codec,
- not to ogg' infrastructure, then the codec makes bisection calls
- into the ogg layer.
-<xiphmont> it's more complex, and I'm not sure what I really get out of it.
-<derf> Well, the codec doesn't really need to do that.
-<xiphmont> in fact, I'm beginning to wonder if moving the granpos parsing
- away from relying on header at all might be a good idea.
-<derf> The codec really just wants "give me the packet at this granpos"
-<derf> The bisection can still be done in the ogg layer to find that
- packet.
-<xiphmont> derf: same basic division of labor.
-<xiphmont> the request still originates at the codec.
-</pre>
-
-
-<h3>excerpt two</h3>
-
-This excerpt discusses:
-<ol>
-<li>keyframe pagination in video
-<li>keyframe seeking using granule position bit-division
-<li>alternate keyframe location proposals
-</ol>
-
-<pre>
-
-<rillian> afaik that's just a detail of smpte timecode
-<xiphmont> ...and preserving pulldown and non-interval-centered frames.
-<rillian> ugh
-<xiphmont> (ie, what offset in the sample period is the frame)
-<xiphmont> yeah, ugliness.
-<xiphmont> but not really representationally difficult.
-<rillian> speaking of, do you see any advantage to doing page flushes
- before or after keyframes?
-<rillian> either to simplify seeking or initialization retention in
- something like icecast
-<xiphmont> it doesn't affect seeking any, really. It makes streaming
- slightly easier for lazy programmers.
-<rillian> xiphmont: do you mean icecast should pull out the keyframe packet
- and repage it?
-<xiphmont> rillian: if there's no flush, then it should as an optimization.
- It's not necessary, but it's nice.
-<xiphmont> either the streamer or the source should be smart enough to start
- streaming at a nice sync point for a and v.
-<rillian> xiphmont: so how would you do frame-accurate seeking with the
- current design?
-<rillian> the concern as I understand was that there wasn't a page/packet
- that was specifically labelled 'this is a keyframe' at the ogg layer
-<xiphmont> rillian: same way vorbis does. Each frame does have a granpos,
- they're just not linear.
-<rillian> s/wasn't/might not be/
-<xiphmont> ah, yes there is.
-<mau> sorry for being slow, but when you say "Frame" is this a packet,
- a page?
-<derf> I thought the encoding was
- frame_number_of_keyframe<<n|frames_since_keyframe
-<xiphmont> right now, each theora frame is one packet.
-<xiphmont> derf: yes.
-<derf> As far as I can see, we can work backwards and reconstruct a
- packet-level granpos for each packet so long as that is still true.
-<derf> Once you include data partitioning a la MPEG, you lose that ability.
-<mau> k, but if you put many packets in a page, then you do not have one
- for each, right? It is just a matter of counting up, and not
- allowing keyframes in the middle of a page?
-<derf> mau: No.
-<derf> You can still put keyframes anywhere.
-<xiphmont> actually, my Ogg algos counts forward from previous page generally.
-<mau> simple question: if there are multiple frames in a page, does the
- ogg layer maintains a granulepos for each?
-<xiphmont> mau: It could, it doesn't.
-<xiphmont> (requires being even more in bed with the codec. And that is
- currently the greatest point of contention in my own mind)
-<mau> ok. and how to detect when a keyframe arrives in the middle of a
- page?
-<xiphmont> mau: the codec knows. Ogg doesn't.
-<mau> that's what I needed to know. So the codec initiates the seeking
- request
-<xiphmont> Ogg knows only how to get to a requested granpos.
-<derf> Oh, no, you can't always get a granpos back for every packet.
-<xiphmont> mau: it doesn't have to; that's one possible way to do it, yes.
-<derf> You can still put keyframes in the middle of pages, but if you put
- two of them in one page...
-<xiphmont> derf: you can, but only going forward.
-<xiphmont> Ogg is built on the idea of chronological decode; data propagates
- forward in time.
-<derf> If I encode PIPPIP in one page, I have no way of knowing the first
- I is there just by looking at granposes.
-<xiphmont> no, but you have other data in the page; namely, the codec should
- be able to tell by looking at first byte.
-<xiphmont> It is a consequence of Ogg having no codec-specific awareness.
-<derf> Yes, but even the codec cannot tell with just the granposes.
-<xiphmont> correct, but the codec need not function only with granpos.
-<xiphmont> the codec knows its own keyframes.
-<derf> If the codec need not function only with granposes, then why are
- we trying to build a seeking mechanism that works with just them?
-<xiphmont> division of labor; Ogg is able to hand you any *page*, not any
- *packet*.
-<xiphmont> even Vorbis does this.
-<mau> ok, wouldn't it be better to require each new keyframe to start a
- new page then?
-<xiphmont> Ogg hands you the nearest preceding page for the codec to then
- discard the minimum amount of page data to get to the packet it
- wants.
-<mau> to make seeking easier/faster/lazier?
-<xiphmont> but it doesn't.
-<xiphmont> Seek to page. Start grabbing packets.
-<derf> xiphmont: Yes, I understand this, but...
-<xiphmont> Discard packets until you see a keyframe
-<mau> k
-<xiphmont> Ogg would have to do the same thing.
-<mau> I see
-<xiphmont> You *can* if you want to, certainly.
-<derf> Say that page I gave above starts on frame n.
-<xiphmont> There's nothing stopping or even discouraging you ;-)
-<xiphmont> derf: OK
-<derf> I want to seek to frame n+3.
-<xiphmont> OK
-<derf> I get that page's granpos, and discover there's a keyframe at frame
- n+4.
-<xiphmont> Ogg, in seeking, hands you the page that is guaranteed to have the
- start of n+3.
-<derf> I know nothing about the type of packets n to n+3.
-<xiphmont> (or, more importantly, hands you the page guaranteed to have the
- keyframe you need to decode n+3)
-<derf> Without physically examining the packets.
-<xiphmont> true. Neither does Ogg.
-<derf> So I have to go all the way back to the previous keyframe to
- decode them.
-<xiphmont> No.
-<xiphmont> You already have it for free.
-<xiphmont> Assume the keyframe shift in granpos is 8.
-<derf> Okay.
-<xiphmont> (you get a new keyframe at most every 256 packets)
-<derf> Yeah, I know what this translates to.
-<xiphmont> but the current actual pattern is: IPPPPPIPPPPPIPPPP....
-<xiphmont> your granposes are:
-<xiphmont> 0 1 2 3 4 5 600 601 602 603 604 605 c00 c01 c02....
-<xiphmont> you want to decode frame 602; seek to 600.
-<xiphmont> and you know you have to seek directly to 600 because you know how
- the granpos works.
-<xiphmont> 600 is your keyframe.
-<xiphmont> if 600 does not start the page, ogg hands you the page with 600 on
- it.
-<rillian> so you get a page with, for example, the end of 4, 5, 600, and the
- start of 601
-<rillian> you start pulling out packets
-<rillian> discard until you get to 600, which you decode
-<derf> xiphmont: But, I don't know the frame is called 602.
-<rillian> pull in the next page, pull out 601 and discard it
-<derf> I want to seek to frame 8.
-<rillian> then pull out 602 and resume normal decode
-<derf> All I know is that its granpos is <= 800.
-<xiphmont> now, you're right; always having a keyframe start a page
- eliminates some amount of inspect/discard; but you can
- inspect/discard in a few processor cycles.
-<rillian> xiphmont: aye. seems a requirement to avoid the discard isn't needed
-<xiphmont> derf: OK, then it's a 2-stage bisection. you ask ogg for 'page
- before 800'; you see that the granpos is 600+whatever.
- then seek to 600.
-<xiphmont> (or, Ogg could do that internally with knowledge of the granpos
- structure)
-<mau> k, this last one explained it for me
-<derf> xiphmont: Right, but here's the issue:
-<derf> In my PIPPIP example, Ogg doesn't know the granpos of the first 4
- packets.
-<xiphmont> sure.
-<derf> And the codec can reconstruct them just from the granpos of the
- page.
-<derf> s/can/can't
-<xiphmont> sure it can.
-<derf> How?
-<xiphmont> the count is *reducible* to a monotonically increasing function :-)
-<xiphmont> (assuming you have two granposes)
-<xiphmont> you're always counting up or down one frame.
-<rillian> i.e. you actually need the previous page in derf's example
-<derf> rillian: But the previous page doesn't tell you anything about
- packets 1-4.
-<xiphmont> yes, the first 'P' is undefined granpos without previous page.
-<xiphmont> ...but if your stream is not starting with a keyframe, that P
- frame is not decodable anyway.
-<derf> Let's say the previous granpos is 0|F0
-<rillian> derf: ok, I see. I was misunderstanding the granulepos hack.
-<xiphmont> derf: yes it does. If gives you the granpos of the first packet.
-<xiphmont> (ie, it gives you the granpos of the last frame of the previous
- packet, and you can always count forward)
-<derf> Then the granpos for those frames can be F1|00 F1|01 F1|02 F1|03
- or 0|F1 F2|00 F2|01 F2|02 or ...
-<xiphmont> you [the codec] knows if they're keyframes or not.
-<derf> Only if I look at the packets themselves.
-<xiphmont> yes.
-<derf> My claim was that there was no way to do it without looking at the
- packets.
-<xiphmont> blow 10 cycles on inspecting, and avoid the need for a 64 bit
- timestamp on every packet :-)
-<derf> I'm not arguing for a timestamp.
-<xiphmont> Oh. Yes, your claim is correct. Apologies.
-<rillian> but it still doesn't matter much, because discarding as you go
- through a single page is cheap
-<xiphmont> You need to inspect the packets. It is the responsibility of the
- codec definition to make that easy.
-<derf> My argument is this: If I have to inspect the packets ANYWAY for
- this to work right, why am I going through this complicated granpos
- scheme instead of just using a normal, sane mapping of
- frame=granpos, and storing an offset to the keyframe in the packet?
-<xiphmont> (Vorbis places that information in the first byte)
-<xiphmont> derf: the information is redundant.
-<xiphmont> Yes, you certainly *can* do it that way.
-<xiphmont> I'm even still considering it. it does have advantages.
-<mau> monty: if the granulepos hack is made "official" and mandatory
- for other video codecs however, you could have ogg doing the
- inspection, right?
-<xiphmont> OTOH, I'm also considering hardwiring a number of granpos
- mechanisms into Ogg such that it can seek without any codec
- knowledge.
-<xiphmont> the two approaches are mutually exclusive (at least, rationally so)
-<xiphmont> mau: yes, what you said.
-<derf> I do not see how you're going to be able to accomplish seeking
- without codec knowledge.
-<derf> I thought I had just demonstrated why your current scheme cannot
- do this.
-<xiphmont> derf: not entirely; however, you could achieve enough to avoid
- the need for two-way feedback between the mux and codec layers.
- The current proposal (which includes this two way feedback) is
- very unusual and causing outside developers fits.
-<xiphmont> for example, it means the Ogg demux has to interface with an
- Ogg-like codec glue.
-<derf> I had always assumed this was part of the design.
-<derf> By saying, to begin with, "the codec decides what granpos means".
-<xiphmont> the current normal division of demux and decode has a different
- division; it would make it hard to use Ogg as a generic demux
- system in something like xine, where the 'vorbis' codec could
- just as easily handle the output from AVI or Ogg demux.
-<xiphmont> derf: it always has been. That doesn't mean I'm ignoring the
- advantages of alternatives.
-<xiphmont> it is not yet at the point where changing my mind would break
- existing installations, so it's still worth debating. That said,
- I've seen nothing yet to change my mind.
-<derf> The vorbis "codec" really has two pieces.
-<derf> One manages decoding the packets.
-<xiphmont> one manages the Ogg mapping.
-<derf> Right.
-<derf> The first can be separated out and used for other container formats.
-<derf> The other containers are then responsible for providing an
- equivalent of the second.
-<xiphmont> ...and we probably can't escape needing *some* glue for any given
- codec.
-<xiphmont> even if we strive to make the division similar.
-<xiphmont> 'similar' is not 'identical'.
-<xiphmont> that is the primary reason I've not changed my mind. Being in
- bed with the codec makes possible demux/decode lib APIs with some
- very nice features.
-<xiphmont> (ala Vorbisfile)
-<xiphmont> So, it sounds like we're entirely on the same page.
-<xiphmont> [pun not intended]
-<derf> Yes, except that if you're in bed with the Theora codec, you
- shouldn't need this complicated of a granpos mapping.
-<derf> And I still don't see what it gets you.
-<mau> let me see if I understand you derf: if you are going to have to
- inspect the packets anyway
-<mau> why don't you use a linear count?
-<mau> is this it?
-<derf> mau: Correct.
-<mau> guess the hack can possibly give you a closer location
-<rillian> the case with mng is interesting. it's natively variable framerate
- (or more properly can be) so some realtime base (it has a field for
- mapping 'ticks' to seconds) is the obvious granulepos. Except it
- has the same keyframe problem theora does, and it's worse because
- while identifying a restart point is easy (there's a special chunk
- type) the codec has to do quite a bit more work to determine which
- pieces are skippable
-<derf> Actually, it gives you a farther one.
-<xiphmont> derf: it wastes space.
-<xiphmont> you certainly can do it that way. You'll sink additional bitrate
- to do it.
-<derf> xiphmont: Yes, it does move a few bits that are currently in the
- granpos into the packets.
-<derf> mau: If I want to seek to frame 8, and I ask for the granpos
- closest to 800, I get 605... three packets beyond where I want to
- be.
-<xiphmont> yeah, you'll lose ~ half a kilobit to it.
-<xiphmont> depending on framerate/keyframe freq.
-<derf> I don't have my H.264 spec on hand, but IIRC, they do the same
- thing.
-<xiphmont> However:
-<xiphmont> If you're a minimalist demux layer without precise seek....
-<xiphmont> you can go straight to a keyframe with the granpos hack.
-<xiphmont> (without asking the codec)
-<xiphmont> that's probably the last minor perq.
-<derf> "without precise seek" can be up to 2**keyframe_shift frames off.
-<xiphmont> ...which is exactly what mplayer and xine do.
-<xiphmont> you get the next following keyframe past what you ask for.
-<xiphmont> ...and they could continue to use their demux framework.
-<xiphmont> ...and it will give the results they're already getting.
-<xiphmont> (something tells me there will be outside devs wedded to their
- current libs)
-<rillian> which is why you did this in the first place?
-<xiphmont> well, yeah.
-<xiphmont> *I* want everything to always be perfect and correct :-)
-<xiphmont> you can do it either way. Which is not to say derf doesn't have a
- point.
-<derf> xiphmont: Perfection can take an awful lot of effort, as exhibited
- by this long drawn out conversation, which I'm sure is not the first
- one.
-<xiphmont> you could still do the Xine way with explicit keyframe offset in
- the packet, you just get a blank video until you hit a keyframe,
- or just discard alot.
-<xiphmont> (note that xine/mplayer also do that in alot of codecs. Actually
- xine has an annoying tendency to start decoding P and B frames
- starting with a uniform green field)
-<derf> Heh.
-<xiphmont> and not bothering to wait for keyframe.
-<xiphmont> So, in summary, derf's offset gives a much simpler mechanism, but
- eats a bit of bitrate (.5-1 kilobit) and makes it harder for
- pansy-ass demux layers to get to keyframes. The granpos hack
- method has the drawback of conceptual complexity although I
- maintain the code isn't actually any more difficult.
-<xiphmont> you need to know the additional information of 'keyframe shift'.
-<derf> It also adds a limit to the amount of frames between a keyframe.
-<derf> One which, unlike MPEG, the underlying codec doesn't actually need.
-<xiphmont> yes, but for seekable video, if you're only having a keyframe
- every 30,000 frames, you're being a little too 1337.
-<xiphmont> it is also the case that if we settle on one mapping, and it
- turns out to be a bad idea, we change the glue. Supporting both
- would require little.
-<xiphmont> it looks like a 'new' codec, but uses all the same infrastructure.
-<derf> That just means you have all the software inadequacies of both,
- since players will then be required to support both.
-<derf> So any arguments of "simpler" become meaningless.
-<xiphmont> you were just now arguing 'more flexible' (no keyframe spacing
- restriction)
-<derf> I didn't say the other arguments were meaningless.
-<xiphmont> no.
-<xiphmont> you didn't.
-<xiphmont> I'm just saying the penalty for being wrong is pretty mild.
-<derf> I'm suggesting that the reality of the situation is that whatever
- you decide now is going to be it, because no one will want to
- complicate matters that much for the relatively mild gains of
- "slightly more flexible".
-<derf> Or, for that matter, "slightly easier braindead demuxers".
-<xiphmont> In any case, I don't actually want to cut the lightweight
- mplayer style approach out of the picture.
-<xiphmont> the granpos hack does give him slightly more rope, should he
- choose to use it. I realize it's a weak argument, but it's there.
-<derf> Oh, and if you really wanted to, you could eliminate the stream
- space overhead for the keyframe offset.
-<derf> You have to load all the previous pages ANYWAY, to decode back to
- that point.
-<derf> So you could load them, scan them backwards for keyframes, and
- then turn around and decode them forward.
-<derf> The only overhead is the additional buffer space. Or time for
- multiple I/Os if you run out of that.
-<xiphmont> derf: seeking backward is more expensive than forward.
-</pre>
-
-<h3>excerpt three</h3>
-
-This excerpt discusses:
-<ol>
-<li>introduction of discontinuous streams
-<li>ordering of pages in a multiplexed Ogg stream
-<li>ordering differences between continuous and discontinuous streams
-<li>text/captioning streams and captioning examples
-<li>seeking withing a multiplexed Ogg stream
-</ol>
-
-<pre>
-
-<Arc> hey monty
-<Arc> have some questions about oggfile w/ streaming servers
-<Arc> and how codecs get interlaced in a physical bitstream
-<Arc> first, whats the process for codecs to get concurrently
- multiplexed. i know how pages etc etc, but how do the pages get
- paced?
-<xiphmont> chronological order by granpos.
-<Arc> the granulepos of vorbis means nothing in relationship to theora
-<Arc> and in the case of writ, it means nothing at all. they're ordered
- by granulepos but they're needed by their start time, which is
- something only libwrit would know
-<Arc> how is theora and vorbis being synced, i mean, their pages as
- close to each other as needed by the player?
-<xiphmont> chronological order. Ogg will ask the codec to translate granpos
- to absolute time if it needs to know.
-<Arc> um ok so that isn't going to work at all for writ
-<Arc> granulepos = end time, not start time.
-<Arc> but for seeking it needs end time
-<xiphmont> granpos *is* end-time :-)
-<xiphmont> granpos is 'timing of last valid data to come out of this page'.
-<Arc> but if writ packets are put into the stream in the chronological
- position of their end time they wont be available for their start
- time, which is a variable length before their end time
-<Arc> writ packets cover time ranges. "this packet is valid between this
- granule and this granule", so there's a start and end time
-<xiphmont> right.
-<xiphmont> so do vorbis packets.
-<Arc> currently the spec is setup to allow overlap of these times by
- different phrases and page granulepos = endtime, packets ordered
- by end time (so some phrases may be put into the bitstream before
- they're started)
-<xiphmont> the seeking alg depends on end time.
-<Arc> yes im not concerned with seeking, we have seeking in the bag
- except for long term phrases + streaming, lets ignore that for now
- tho
-<Arc> im concerned about they're ordering in the logical bitstream
-<xiphmont> You may have opened too large a can of worms with overlapping.
-<Arc> if a writ phrase lasts 10 seconds it needs to be in the physical
- bitstream close to or before its start time, relative to the
- vorbis/theora, you can expect the vorbis + theora layer to be
- buffered for ten seconds
-<derf> xiphmont: Overlapping does not complicate the problem at all.
-<xiphmont> derf: actually it kills the current seeking algo.
-<Arc> no it doesn't actually
-<derf> You can replace any group of overlapped captions by a single
- caption that lasts the entire duration of it.
-<derf> And reproduce any problems.
-<Arc> the granulepos's are in order. the granulepos's are ordered by end
- time, their start times are not in order, but they must be defined
- before they're needed (or close to it) in relation to the other
- logical bitstreams for them to be useful
-<xiphmont> One caption that begins before and ends after another.
-<derf> xiphmont: Which exhibits the exact same problems as just one
- caption.
-<xiphmont> design a seeking algo that works for that.
-<derf> Conceptually, you can take any group of overlapping captions and
- stick them all in one packet.
-<Arc> we do. you seek to the position that you need and begin processing
- from there. you'll have everything.
-<xiphmont> actually, yes, you're right.
-<Arc> my first question (these are very related) is how OggFile,
- oggmerge, whatever - how does that sync. do they ask the codec to
- pace per realtime, or does it ask the codec for a granulerate
-<xiphmont> if the packet ended after the seek point, it wouldn't have
- appeared yet.
-<Arc> because the latter will break our current spec bigtime
-<xiphmont> there are two possibilities; still working out which to use.
-<xiphmont> One is two codec types: continuous and discontinuous.
-<xiphmont> a continuous codec specifies 'buffer as much as you need to
- prevent any time gaps in my data presentation'. A discontinuous
- stream type has to 'fall out' of the stream; seeking and sync are
- according to continuous streams, and the stream assembly has to
- make sure the discontinuous pages magically arrive in time
-<xiphmont> [as the buffering/sync algo will not look arbitrarily far head for
- them]
-<derf> This sounds much like what I suggested to Arc.
-<xiphmont> the second possibility is to require a hint in the metaheader for
- how long each stream type has to look ahead.
-<xiphmont> Audio and video would be obvious continuous types.
-<xiphmont> discontinuous types would not be used for sync; the granpos is
- allowed to appear out of order.
-<Arc> well my question is, will libwrit/etc be asked "where does this
- packet belong in the physical bitstream" or will OggFile/etc place
- it by granulepos
-<xiphmont> Oggfile will place it.
-<Arc> yes but how
-<Arc> will it ask the codec?
-<xiphmont> You don't muck with pages and raw ogg stream in Oggfile. packets
- in, packets out.
-<xiphmont> In encode, all packets are submitted with timing info.
-<xiphmont> Oggfile builds and places pages as needed to obey timing magically.
-<xiphmont> [it would be a serious asspain to require each app to do it]
-<Arc> yes I know that. but I see two ways for OggFile to place it.
- by asking the codec for a granulerate (ie, 88200 granules per
- second with 44.1/stereo vorbis or 29.95 granules per second with
- NTSC theora) and calculate its position based on granulepos or
- will the codec tell OggFile "this belongs at 19.23 seconds"
-<derf> Assuming a fixed granulerate is bad.
-<Arc> because the prior would require a spec rewrite, the latter is
- perfect
-<derf> Current Theora's granulerate is not constant.
-<Arc> derf, yea but assuming API for something that isn't public yet is
- also bad :-)
-<xiphmont> Arc: we can have a packet show up with begin and end timing.
-<Arc> xiphmont, awesome. thanks :-)
-<xiphmont> Ogg won't necessarily know that on decode side (it will have to
- ask the codec), but on encode side, just have codec provide it.
-<xiphmont> It makes no sense for continuous streams, but for discontinuous it
- seems handy.
-<Arc> second question, do you feel it would be a good idea for OggFile
- (which I very much assume icecast2/libshout will use) to put the
- job of keeping track of and reporting "state information", ie,
- headers
-<xiphmont> yes
-<Arc> vorbis would just spit out the headers for state information
-<xiphmont> Actually, your grammar doesn't parse.
-<Arc> writ, however, could spit out any pages whose granulepos has not
- expired yet (to current) thus preventing the need in the spec to
- have phrases "expire" by time and need to be "refreshed" every few
- seconds for streaming clients
-<xiphmont> well, without readahead hinting, you still have an issue.
-<xiphmont> You either see a long-time caption too late.... or you miss it on
- seek.
-<Arc> thus the writ codec on icecast's side could buffer the last few
- pages (those that are still valid), on a new client connecting,
- spit out the header + however many packets are in the buffer
-<xiphmont> [eg... how does Oggfile need to know it has to buffer a full
- minute of video?]
-<Arc> how big is that window?
-<xiphmont> in continuous/discont... there is no window.
-<derf> The problem is that icecast needs to buffer some data from a
- discontinuous stream.
-<xiphmont> A discontinuous stream will need a hint.
-<derf> i.e., it needs to know the granpos<->time mapping.
-<Arc> or it could be outside icecast
-<Arc> right now icecast is buffering the vorbis headers
-<xiphmont> yes. But it will also need to know window ahead of time without
- reading the whole file.
-<derf> So it can tell if it has to buffer packets from a stream if they
- appear in the stream long before the granpos time.
-<Arc> but if icecast is using OggFile this could be part of the API, the
- stream state info, a buffer of pages which are needed to bring a
- new client "up to speed"
-<xiphmont> yes
-<xiphmont> It should be.
-<derf> I don't see why it needs any kind of window.
-<Arc> i don't understand the "hint" as you call it, why does it need to
- read ahead at all?
-<derf> With cont/discont streams.
-<xiphmont> you have a ten minute caption with a 20 minute gap ahead of where
- it appears.
-<xiphmont> Do you really want to buffer 20 minutes of video to find it?
-<Arc> with seeking or streaming? two different things
-<xiphmont> What if the caption stream ends early? You stop and wait for the
- whole stream to buffer to figure that out.
-<xiphmont> I'm speaking streaming.
-<Arc> why would you stop playing audio/video? you either receive a
- caption or you don't
-<derf> Arc's idea was that packets always appear before they're needed.
-<derf> In the stream.
-<xiphmont> OK. Now seeking. If it appeared early, you miss em when you seek.
-<derf> So if you haven't seen it when you find audio/video that comes
- later, then they're not there.
-<Arc> xiphmont, how so? wont it seek each logical bitstream based on
- its granulepos?
-<xiphmont> No.
-<xiphmont> Seeking is global.
-<Arc> what is the window then?
-<xiphmont> You seek to *one* point in the stream, based on all granposes.
-<derf> xiphmont: I can't see how that one point is well-defined.
-<xiphmont> right. what is the window then?
-<Arc> but discontinuous streams..
-<xiphmont> derf: granposes are all in chronological order.
-<xiphmont> Arc: discontinuous streams to not contribute to sync, they
- piggyback off of it.
-<xiphmont> A continuous stream is just a stream with a readahead window of
- 'infinite'.
-<Arc> yes but they're going to vary by a certain %, sometimes the audio
- will be ahead of the video, sometimes vice versa. they're both VBR
- so there needs to be a window of some sort
-<xiphmont> A continuous stream has a readahead window of infinite. "Buffer
- as much as necessary to keep all queues nonempty"
-<Arc> the continuous/discontinuous status of a stream is provided by the
- OggFile codec, right?
-<xiphmont> yes
-<xiphmont> that's current design.,
-<Arc> ok. then, whats the window for discontinuous
-<xiphmont> exactly.
-<xiphmont> It would need to be set somewhere.
-<Arc> see it's easy, in writ, for us just to say "this is the maximum
- realtime length of a caption compared to its placement in the
- stream" and then prematurely end then refresh the phrases that
- need it
-<derf> And this limits the length of your captions.
-<xiphmont> sure.
-<Arc> exactly.
-<Arc> not the apparent, or source, length of the captions. its all
- internal to libwrit
-<derf> Right.
-<xiphmont> ...but be careful; your maximum duration/gap will set the
- buffering requirements of the entire stream.
-<Arc> "if a caption's end-time minus physical placement time is greater
- than x, then terminate all current phrases early, then immediately
- redefine them in the same order and location"
-<xiphmont> sorry, no, just duration.
-<Arc> well thats why I'm asking you about this, because this is global to
- Ogg
-<xiphmont> OK, I think we're on the same page right now.
-<Arc> well it has to be physical placement time because some captions
- will need to be defined before other captions, remember they need
- to be ordered by their end time. that will determine if they get
- cut, and if they get cut before their start time, they wont need
- to be defined yet at all.
-<Arc> i was up to 6am this morning running through different projections
- for how this could work with seeking/streaming. derf's overlapping
- durations idea does play out well
-<derf> Except that if you want to cut, you may need to drop out packets
- from the middle (or just keep the extraneous data).
-<Arc> see I originally had it "all captions are FIFO, the first to be
- defined are also the first to end, otherwise they need to be cut
- and recreated, always". that can become a very bloated mess with
- text constantly getting redefined
-<xiphmont> that's the same with other codec types.
-<derf> When cutting off the end of something.
-<Arc> ?
-<xiphmont> derf: that's the same with other codec types.
-<xiphmont> editing is always messy. Ogg is not intended to be easy to edit.
-<derf> Yes.
-<derf> Editing is messy in general.
-<Arc> by cut I mean "while encoding the bitstream, if such conditions
- exist, split a single set of phrases into two butted end to end,
- ie, ending and immediately re-defining it"
-<derf> Just having global headers with different codebooks makes
- combining different streams hard.
-<Arc> i don't mean ala vcut
-<derf> (without imposing overhead of adding new headers in each segment)
-<Arc> the logical bitstream wont get a EOS/BOS
-<derf> Oh, I was talking about someone actually cutting an
- already-multiplexed stream into two pieces.
-<Arc> its just the phrases, the captions, that will get cut. their
- durations split at the window mark, processed as needed,
- redefined/copied to start at the same time their original was
- prematurely terminated, process repeated as needed so a single
- very long phrase (aka caption/subtitle) can be split-copied into
- hundreds of phrases, each redefining the same data for another
- X second window
-<Arc> derf, yea lets not get too complicated here :-)
-<derf> Well, it is still a use case to consider.
-<derf> People might want to actually do such a thing.
-<Arc> I'm not concerned with cutting, this is just text. lossless.
-<derf> Even though there are currently no tools for it.
-<Arc> people could use the same mechanism icecast does for cutting a
- bitstream. each OggFile codec keeps track of "state information",
- which typically is just the header but for discontinuous streams
- could be the last few buffered pages..
-<Arc> if OggFile has such an API it would make cutting child's play.
-<Arc> monty, so, is this going to be variable? or is it going to get set
- at some point? because i might as well build functionality for
- that into the design here while I'm working on it
-<xiphmont> Ogg needs to be able to ask the codec what the readahead window is.
-<xiphmont> the codec can have that set inherently or git it from the logical
- stream header.
-<Arc> yea but what should this be
-<Arc> are we talking a minute? 10 seconds? 1 second?
-<xiphmont> actually thinking a sec...
-<Arc> ok :-)
-<derf> A second could be as much as 700k of video.
-<derf> Which is probably reasonable.
-<xiphmont> OK, thinking over, no change in state.
-<derf> But captions typically last 3 to 6 seconds.
-<xiphmont> 'what derf said'.
-<derf> Which means you've quadrupled to quintupled the size of your
- caption stream.
-<xiphmont> Or you could just decide 'losing last one is no big deal'.
-<Arc> yea, exactly.
-<xiphmont> ...and go to placing in the bitstream according to start time.
-<derf> xiphmont: That's what current DVD players do, IIRC.
-<xiphmont> derf: good to know.
-<Arc> the smaller the window the less buffering on the player's side,
- but the greater the codec size grows
-<xiphmont> yes.
-<derf> A player that really did care could do a separate seek for each
- discontinuous stream instead of one global one.
-<Arc> it makes things so much easier to have it ordered by end time
-<xiphmont> So.... perhaps the window should be set... and left up to the
- application if it cares to use it or not. We go to ordering
- discontinuous stream types by begin time, and make sure we're
- tolerant of losing 'the one before' if the application chooses to
- do it that way.
-<Arc> i mean, coding is easier by start time, duh, no buffering, no
- changing the order, just drop it in and let it fly or not
-<derf> And then buffer just the discontinuous data (which one would
- expect to be far less than the continuous) until it caught up to
- the global seek point.
-<xiphmont> derf: yes.
-<xiphmont> no, you don't want to do separate seek... for example, in the
- streaming case... you can't whether you care or not.
-<Arc> ok but if they're ordered by start time we still need a "window"
- for very long captions, otherwise seeking would never have them
- appear
-<xiphmont> ...so don't turn it into supporting multiple cases. Make it
- multiple possibilities in a single case.
-<xiphmont> Arc: yes.
-<xiphmont> And the application can decide to mind the window or not...
-<Arc> the encoding application
-<xiphmont> A PC software player will always want to mind. An embedded
- player may simply not be able to.
-<xiphmont> No, decoding.
-<xiphmont> encoding always requests a hint... but the decoder can ignore
- the readahead hint without ill-effect if it wishes.
-<Arc> no i mean, the encoder would have to "refresh" a phrase periodically
-<xiphmont> unless you want to miss a few, yes.
-<Arc> if ordered by start time, the player simply seeks and runs.
-<Arc> well its not missing a few that bothers me, its missing a very
- long one
-<xiphmont> you can't have everything you want here :-) Very long would need
- to refresh in either case.
-<Arc> ok so there would need to be a refresh window variable that the
- encoder could set, but could default to a certain number
-<Arc> yes I know, refresh is unavoidable.
-<xiphmont> ok
-<Arc> yea for all cases ordering discontinuous streams by start time is
- easier.
-<Arc> less elegant, tho
-<xiphmont> 'however the codec wants to do it'. It could be a hardcoded
- number in the codec for all I care (I know that's not really
- sensible)
-<derf> Placed in the stream by start time can have a much longer refresh
- time than placed by end time.
-<xiphmont> derf: yes.
-<xiphmont> lookin' like a win all around.
-<xiphmont> ...and this can be added to spec without breaking a single thing.
-<Arc> if the encoding application chose it could set this window
- extremely high, understanding that long term captions would never
- appear if it's seeked
-<Arc> or streamed.
-<xiphmont> Arc: yes.
-<xiphmont> If ordered by start time, I think the granpos should also be
- start-time.
-<Arc> and this would eliminate the need to monitor "state information"
- with streaming, it'd act no different from a seek
-<xiphmont> but that's a minor detail I'd rather debate another time.
-<Arc> well yea that'd have to be the case or you'd have out of order
- granulepos and that'd create chaos
-<Arc> ok so, the behavior part of the spec should change so that
- packets are ordered by start time, in sequence, and it doesn't
- matter if they overlap
-<xiphmont> Arc: yes, seems like it.
-<derf> One could always look at that stuff to see how it wound up being
- implemented.
-<Arc> derf, you had a great idea, in any case, on how to handle
- overlapping when granulepos was by end time
-<Arc> i hate to erase it all, I'm going to copy this to another location
- on the wiki...
-<derf> I don't know how you make seeking work with granpos as the start
- time.
-<xiphmont> OggFile would need to distinguish between cont and discont.
-<xiphmont> It needs to ask codecs for granpos mappings anyway.
-<Arc> easy. you seek to a point, you only display new phrases. long term
- phrases are periodically refreshed, so the player just displays
- them as they come in.
-<xiphmont> if it's end-time and packets are in chron order, discont streams
- are useless for sync and seek. If it's start-time, they can
- contribute.
-<xiphmont> I think derf was concerned about complicating the seeking algo.
-<derf> Mostly.
-<xiphmont> I don;t think this would complicate it much.
-<xiphmont> It just changes the 'boundary was at head or at tail' of page.
- The bisection is identical.
-<xiphmont> ...and the meaning of seek points is the same.
-<xiphmont> you still seek to the largest granpos in the stream preceding
- requesting time position.
-<xiphmont> [preceding or equal to]
-<derf> "the page with..."
-<xiphmont> well, you either seek *to* that page [if it's discont] or just
- past that page [if it's cont]
-<i>Actually, if it's continuous, you seek just past that page if the
-last packet is not continued, or to that page is the packet is
-continued -- Monty</i>
-<xiphmont> you have both those boundaries. You just use cont/discont to
- decide which.
-<xiphmont> I think of seeking as an operation of going to a specific page
- boundary, not a specific page.
-<xiphmont> [and that makes this extension much cleaner]
-<derf> Okay, I think I see now... I was holding the definition of what a
- granpos meant fixed as a design constraint.
-<xiphmont> derf: well, it had been. This is actually a new innovation within
- the machinery.
-<derf> But I agree this is a reasonably simple special case.
-<Arc> so discontinuous streams, granulepos is the start time of the packet
-<xiphmont> what complication do you see?
-<xiphmont> Arc: start time of the first packet beginning in the page
-<xiphmont> [not a continued packet]
-<xiphmont> Oh, continued packets.
-<xiphmont> No continued packets in discont streams.
-<xiphmont> You think that's reasonable?
-<Arc> not really, because a discontinuous packet could be quite large
- and you'd want it split across page borders
-<derf> xiphmont: It gives a hard limit on packet size, doesn't it?
-<xiphmont> yeah, you're right.
-<derf> xiphmont: It's an "if"... I'm not worried about it either.
-<xiphmont> OK, a restriction:
-<xiphmont> Continued packets must be continued in an immediately following
- page.
-<xiphmont> derf: it is an if.
-<Arc> that sounds healthy
-<xiphmont> OK
-<xiphmont> See the nice part about *all* of this is...
-<xiphmont> If a third-party impl screws up, it doesn't break the code, it
- just munges playback slightly.
-<xiphmont> We can extend the spec to include them...
-<xiphmont> the stream format need not rev.
-<xiphmont> we already know existing code isn't up to discontinuous anyway.
-<xiphmont> OggFile is intended to do this for the app. I do not expect most
- apps to implement this. It is purely a mux-layer operation.
-</pre>
+<h2>Appendix A: multiplexing examples</h2>
--- >8 ----
List archives: http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'cvs-request at xiph.org'
containing only the word 'unsubscribe' in the body. No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.
More information about the commits
mailing list