[xiph-commits] r16991 - trunk/ogg/doc

xiphmont at svn.xiph.org xiphmont at svn.xiph.org
Fri Mar 19 23:32:37 PDT 2010


Author: xiphmont
Date: 2010-03-19 23:32:37 -0700 (Fri, 19 Mar 2010)
New Revision: 16991

Modified:
   trunk/ogg/doc/oggstream.html
Log:
Substantial expansion of Ogg container overview document; still requires filling in of several 
references by not-yet-present examples.



Modified: trunk/ogg/doc/oggstream.html
===================================================================
--- trunk/ogg/doc/oggstream.html	2010-03-19 22:22:07 UTC (rev 16990)
+++ trunk/ogg/doc/oggstream.html	2010-03-20 06:32:37 UTC (rev 16991)
@@ -70,136 +70,398 @@
   <a href="http://www.xiph.org/"><img src="fish_xiph_org.png" alt="Fish Logo and Xiph.org"/></a>
 </div>
 
-<h1>Ogg logical and physical bitstream overview</h1>
+<h1>Ogg bitstream overview</h1>
 
-<h2>Ogg bitstreams</h2>
+This document serves as starting point for understanding the design
+and implementation of the Ogg container format.  If you're new to Ogg
+or merely want a high-level technical overview, start reading here.
+Other documents linked from the <a href="index.html">index page</a>
+give distilled technical descriptions and references of the container
+mechanisms.  This document is intended to aid understanding.
 
-<p>Ogg codecs use octet vectors of raw, compressed data
-(<em>packets</em>). These compressed packets do not have any
-high-level structure or boundary information; strung together, they
-appear to be streams of random bytes with no landmarks.</p>
+<h2>Container format design points</h2>
 
-<p>Raw packets may be used directly by transport mechanisms that provide
-their own framing and packet-separation mechanisms (such as UDP
-datagrams). For stream based storage (such as files) and transport
-(such as TCP streams or pipes), Vorbis and other future Ogg codecs use
-the Ogg bitstream format to provide framing/sync, sync recapture
-after error, landmarks during seeking, and enough information to
-properly separate data back into packets at the original packet
-boundaries without relying on decoding to find packet boundaries.</p>
+<p>Ogg is intended to be a simplest-possible container, concerned only
+with framing, ordering, and interleave. It can be used as a stream delivery
+mechanism, for media file storage, or as a building block toward
+implementing a more complex, non-linear container (for example, see
+the <a href="skeleton.html">Skeleton</a> or <a
+href="http://en.wikipedia.org/wiki/Annodex">Annodex/CMML</a>).
 
-<h2>Logical and physical bitstreams</h2>
+<p>The Ogg container is not intended to be a monolithic
+'kitchen-sink'.  It exists only to frame and deliver in-order stream
+data and as such is vastly simpler than most other containers.
+Elementary and multiplexed streams are both constructed entirely from a
+single building block (an Ogg page) comprised of eight fields
+totalling twenty-eight bytes (the page header) a list of packet lengths
+(up to 255 bytes) and payload data (up to 65025 bytes).  The structure
+of every page is the same.  There are no optional fields or alternate
+encodings.
 
-<p>Raw packets are grouped and encoded into contiguous pages of
-structured bitstream data called <em>logical bitstreams</em>. A
-logical bitstream consists of pages, in order, belonging to a single
-codec instance. Each page is a self contained entity (although it is
-possible that a packet may be split and encoded across one or more
-pages); that is, the page decode mechanism is designed to recognize,
-verify and handle single pages at a time from the overall bitstream.</p>
+<p>Stream and media metadata is contained in Ogg and not built into
+the Ogg container itself.  Metadata is thus compartmentalized and
+layered rather than part of a monolithic design, an especially good
+idea as no two groups seem able to agree on what a complete or
+complete-enough metadata set should be. In this way, the container and
+container implementation are isolated from unnecessary design flux.
 
-<p>Multiple logical bitstreams can be combined (with restrictions) into a
-single <em>physical bitstream</em>. A physical bitstream consists of
-multiple logical bitstreams multiplexed at the page level and may
-include a 'meta-header' at the beginning of the multiplexed logical
-stream that serves as identification magic. Whole pages are taken in
-order from multiple logical bitstreams and combined into a single
-physical stream of pages. The decoder reconstructs the original
-logical bitstreams from the physical bitstream by taking the pages in
-order from the physical bitstream and redirecting them into the
-appropriate logical decoding entity. The simplest physical bitstream
-is a single, unmultiplexed logical bitstream with no meta-header; this
-is referred to as a 'degenerate stream'.</p>
+<h3>Streaming</h3> 
 
-<p><a href="framing.html">Ogg Logical Bitstream Framing</a> discusses
+<p>The Ogg container is primarily a streaming format,
+encapsulating chronological, time-linear mixed media into a single
+delivery stream or file. The design is such that an application can
+always encode and/or decode all features of a bitstream in one pass
+with no seeking and minimal buffering.  Seeking to provide optimized
+encoding (such as two-pass encoding) or interactive decoding (such as
+scrubbing or instant replay) is not disallowed or discouraged, however
+no container feature requires nonlinear access of the bitstream.
+
+<h3>Variable Bit Rate, Variable Payload Size</h3>
+
+<p>Ogg is designed to contain any size data payload with bounded,
+predictable efficiency.  Ogg packets have no maximum size and a
+zero-byte minimum size.  There is no restriction on size changes from
+packet to packet. Variable size packets do not require the use of any
+optional or additional container features.  There is no optimal
+suggested packet size, though special consideration was paid to make
+sure 50-200 byte packets were no less efficient than larger packet
+sizes.  The original design criteria was a 2% overhead at 50 byte
+packets, dropping to a maximum working overhead of 1% with larger
+packets, and a typical working overhead of .5-.7% for most practical
+uses. 
+
+<h3>Simple pagination</h3>
+
+<p>Ogg is a byte-aligned container with no context-dependent, optional
+or variable-length fields.  Ogg requires no repacking of codec data.
+The page structure is written out in-line as packet data is submitted
+to the streaming abstraction.  In addition, it is possible to
+implement both Ogg mux and demux as MT-hot zero-copy abstractions (as
+is done in the Tremor sourcebase).
+
+<h3>Capture</h3>
+
+<p>Ogg is designed for efficient and immediate stream capture with
+high confidence.  Although packets have no size limit in Ogg, pages
+are a maximum of just under 64kB meaning that any Ogg stream can be
+captured with confidence after seeing 128kB of data or less [worst
+case; typical figure is 6kB] from any random starting point in the
+stream.
+
+<h3>Seeking</h3>
+
+<p>Ogg implements simple coarse- and fine-grained seeking by design.
+
+<p>Coarse seeking may be performed by simply 'moving the tone arm' to a
+new position and 'dropping the needle'.  Rapid capture with
+accompanying timecode from any location in an Ogg file is guaranteed
+by the stream design.  From the acquisition of the first timecode,
+all data needed to play back from that time code forward is ahead of
+the stream cursor.
+
+<p>Ogg implements full sample-granularity seeking using an
+interpolated bisection search built on the capture and timecode
+mechanisms used by coarse seeking.  As above, once a search finds
+the desired timecode, all data needed to play back from that time code
+forward is ahead of the stream cursor.
+
+<p>Both coarse and fine seeking use the page structure and sequencing
+inherent to the Ogg format.  All Ogg streams are fully seekable from
+creation; seekability is unaffected by truncation or missing data, and
+is tolerant of gross corruption.  Seek operations are neither 'fuzzy' nor
+heuristic.
+
+<p>Seeking without use of an index is a major point of the Ogg
+design. There are several reasons why Ogg forgoes an index:
+			  
+<ul>
+
+<li>It must be possible to create an Ogg stream in a single pass, and
+an index requires either two passes to create, or the index must be
+tacked onto the end of a live stream after the stream is finished.
+Both methods run afoul of other design constraints.
+
+<li>An index is only marginally useful in Ogg for the complexity
+added; it adds no new functionality and seldom improves performance
+noticeably.  Empirical testing shows that indexless interpolation
+search does not require many more seeks in practice than using an
+index would.
+
+<li>'Optional' indexes encourage lazy implementations that can seek
+only when indexes are present, or that implement indexless seeking
+only by building an internal index after reading the entire file
+beginning to end.  This has been the fate of other containers that
+specify optional indexing.
+
+</ul>
+
+<h3>Simple multiplexing</h3>
+
+<p>Ogg multiplexes streams by interleaving pages from multiple elementary streams into a
+multiplexed stream in time order.  The multiplexed pages are not
+altered.  Muxing an Ogg AV stream out of separate audio,
+video and data streams is akin to shuffling several decks of cards
+together into a single deck; the cards themselves remain unchanged.
+Demultiplexing is similarly simple.
+
+<p>The goal of this design is to make the mux/demux operation as
+trivial as possible to allow live streaming systems to build and
+rebuild streams on the fly with minimal CPU usage and no additional
+storage or latency requirements.
+
+<h3>Continuous and Discontinuous Media</h3>
+
+<p>Ogg streams belong to one of two categories, "Continuous" streams and
+"Discontinuous" streams.
+
+<p>A stream that provides a gapless, time-continuous media type with a
+fine-grained timebase is considered to be 'Continuous'. A continuous
+stream should never be starved of data. Examples of continuous data
+types include broadcast audio and video.
+
+<p>A stream that delivers data in a potentially irregular pattern or
+with widely spaced timing gaps is considered to be 'Discontinuous'. A
+discontinuous stream may be best thought of as data representing
+scattered events; although they happen in order, they are typically
+unconnected data often located far apart. One example of a
+discontinuous stream types would be captioning such as <a
+href="http://wiki.xiph.org/OggKate">Ogg Kate</a>. Although it's
+possible to design captions as a continuous stream type, it's most
+natural to think of captions as widely spaced pieces of text with
+little happening between.
+
+<p>The fundamental reason for distinction between continuous and
+discontinuous streams concerns buffering.
+
+<h3>Buffering</h3>
+
+<p>A continuous stream is, by definition, gapless. Ogg buffering is based
+on the simple premise of never allowing an active continuous stream
+to starve for data during decode; buffering works ahead until all
+continuous streams in a physical stream have data ready and no further.
+
+<p>Discontinuous stream data is not assumed to be predictable. The
+buffering design takes discontinuous data 'as it comes' rather than
+working ahead to look for future discontinuous data for a potentially
+unbounded period. Thus, the buffering process makes no attempt to fill
+discontinuous stream buffers; their pages simply 'fall out' of the
+stream when continuous streams are handled properly.
+
+<p>Buffering requirements in this design need not be explicitly
+declared or managed in the encoded stream. The decoder simply reads as
+much data as is necessary to keep all continuous stream types gapless
+and no more, with discontinuous data processed as it arrives in the
+continuous data. Buffering is implicitly optimal for the given
+stream. Because all pages of all data types are stamped with absolute
+timing information within the stream, inter-stream synchronization
+timing is always maintained without the need for explicitly declared
+buffer-ahead hinting.
+
+<h3>Codec metadata</h3>
+
+<p>Ogg does not replicate codec-specific metadata into the mux layer
+in an attempt to make the mux and codec layer implementations 'fully
+separable'.  Things like specific timebase, keyframing strategy, frame
+duration, etc, do not appear in the Ogg container.  The mux layer is,
+instead, expected to query a codec through a standardized interface,
+left to the implementation, for this data when it is needed.
+
+<p>Though modern design wisdom usually prefers to predict all possible
+needs of current and future codecs then embed these dependencies and
+the required metadata into the container itself, this strategy
+increases container specification complexity, fragility, and rigidity.
+The mux and codec implementations become more independent, but the
+specifications become less independent. A codec can't do what a
+container hasn't already provided for.  New codecs are harder to
+support, and you can do fewer useful things with the ones you've
+already got (eg, try to make a good splitter without using any codecs.
+You're stuck splitting at keyframes only, or building yet another new
+mechanism into the container layer to mark what frames to skip
+displaying).
+
+<p>Ogg's design goes the opposite direction, where the specification
+is to be as simple, easy to understand, and 'proofed' against novel
+codecs as possible.  When an Ogg mux layer requires codec-specific
+information, it queries the codec (or a codec stub).  This trades a
+more complex implementation for a simpler, more flexible
+specification.
+
+<h3>Stream structure metadata</h3>
+
+<p>The Ogg container itself does not define a metadata system for
+declaring the structure and interrelations between multiple media
+types in a muxed stream.  That is, the Ogg container itself does not
+specify data like 'which steam is the subtitle stream?' or 'which
+video stream is the primary angle?'.  This metadata still exists, but
+is stored in the Ogg container rather than being built into the Ogg
+container.  Xiph specifies the 'Skeleton' metadata format for Ogg
+streams, but this decoupling of container and stream structure
+metadata means it is possible to use Ogg with any metadata
+specification without altering the container itself, or without stream
+structure metadata at all.
+
+<h3>Frame accurate absolute position</h3>
+
+<p>Every Ogg page is stamped with a 64 bit 'granule position' that
+serves as an absolute timestamp for mux and seeking.  A few nifty
+little tricks are usually also embedded in the granpos state, but
+we'll leave those aside for the moment (strictly speaking, they're
+part of each codec's mapping, not Ogg).
+
+<p>As previously mentioned above, granule positions are mapped into
+absolute timestamps by the codec, rather than being a hard timestamp.
+This allows maximally efficient use of the available 64 bits to
+address every sample/frame position without approximation while
+supporting new and previously unknown timebase encodings without
+needing to extend or update the mux layer.  When a codec needs a novel
+timebase, it simply brings the code for that mapping along with it.
+This is not a theoretical curiosity; new, wholly novel timebases were
+deployed with the adoption of both Theora and Dirac.  "Rolling INTRA"
+(keyframeless video) also benefits from novel use of the granule
+position.
+
+<h2>Ogg stream arrangement</h2>
+
+<h3>Packets, pages, and bitstreams</h3>
+
+<p>Ogg codecs use <em>packets</em>.  Packets are octet payloads of
+raw, compressed data, containing the data needed for a single
+decompressed unit, eg, one video frame. Packets have no maximum size
+and may be zero length. They do not have any high-level structure or
+boundary information; strung together, the unframed packets form a
+<em>logical bitstream</em> of apparently random bytes with no internal
+landmarks.
+
+<p>Logical bitstream packets are grouped and framed into Ogg pages
+along with a unique stream <em>serial number</em> to produce a
+<em>physical bitstream</em>.  An <em>elementary stream</em> is a
+physical bitstream containing only the pages framing a single logical
+bitstream. Each page is a self contained entity, although a packet may
+be split and encoded across one or more pages. The page decode
+mechanism is designed to recognize, verify and handle single pages at
+a time from the overall bitstream.
+
+<p><a href="framing.html">Ogg Bitstream Framing</a> specifies
 the page format of an Ogg bitstream, the packet coding process
-and logical bitstreams in detail. The remainder of this document
-specifies requirements for constructing finished, physical Ogg
-bitstreams.</p>
+and elementary bitstreams in detail.
 
-<h2>Mapping Restrictions</h2>
+<h3>Multiplexed bitstreams</h3>
 
-<p>Logical bitstreams may not be mapped/multiplexed into physical
-bitstreams without restriction. Here we discuss design restrictions
-on Ogg physical bitstreams in general, mostly to introduce
-design rationale. Each 'media' format defines its own (generally more
-restrictive) mapping. An 'Ogg Vorbis Audio Bitstream', for example, has a
-specific physical bitstream structure.
-Any other codec or combination of codecs will generally also mandate a
-corresponding restricted physical bitstream format.</p>
+<p>Multiple logical/elementary bitstreams can be combined into a single
+<em>multiplexed bitstream</em> by interleaving whole pages from each
+contributing elementary stream in time order. The result is a single
+physical stream that multiplexes and frames multiple logical streams.
+Each logical stream is identified by the unique stream serial number
+stamped in its pages.  A physical stream may include a 'meta-header'
+(such as the <a href="skeleton.html">Ogg Skeleton</a>) comprising its
+own Ogg page at the beginning of the physical stream. A decoder
+recovers the original logical/elementary bitstreams out of the
+physical bitstream by taking the pages in order from the physical
+bitstream and redirecting them into the appropriate logical decoding
+entity.
 
-<h3>additional end-to-end structure</h3>
+<p><a href="ogg-multiplex.html">Ogg Bitstream Multiplexing</a> specifies
+proper multiplexing of an Ogg bitstream in detail.
 
+<h3>Chaining</h3>
+
+<p>Multiple Ogg physical bitstreams may be concatenated into a single new
+stream; this is <em>chaining</em>. The bitstreams do not overlap; the
+final page of a given logical bitstream is immediately followed by the
+initial page of the next.</p>
+
+<p>Each logical bitstream in a chain must have a unique serial number
+within the scope of the full physical bitstream, not only within a
+particular <em>link</em> or <em>segment</em> of the chain.</p>
+
+<h3>Continuous and discontinuous streams</h3>
+
+<p>Within Ogg, each stream must be declared (by the codec) to be
+continuous- or discontinuous-time.  Most codecs treat all streams they
+use as either inherently continuous- or discontinuous-time, although
+this is not a requirement. A codec may, as part of its mapping, choose
+according to data in the initial header.
+
+<p>Continuous-time pages are stamped by end-time, discontinuous pages
+are stamped by begin-time.  Pages in a multiplexed stream are
+interleaved in order of the time stamp regardless of stream type.
+Both continuous and discontinuous logical streams are used to seek
+within a physical stream, however only continuous streams are used to
+determine buffering depth; because discontinuous streams are stamped
+by start time, they will always 'fall out' in time when buffering
+tracks only the continuous streams.  See 'Examples' for an
+illustration of the buffering mechanism.
+
+<h2>Mapping Requirements</h2>
+
+<p>Each codec is allowed some freedom in deciding how its logical
+bitstream is encapsulated into an Ogg bitstream (even if it is a
+trivial mapping, eg, 'plop the packets in and go'). This is the
+codec's <em>mapping</em>. Ogg imposes a few mapping requirements
+on any codec.
+
 <p>The <a href="framing.html">framing specification</a> defines
 'beginning of stream' and 'end of stream' page markers via a header
 flag (it is possible for a stream to consist of a single page). A
-stream always consists of an integer number of pages, an easy
+correct stream always consists of an integer number of pages, an easy
 requirement given the variable size nature of pages.</p>
 
-<p>In addition to the header flag marking the first and last pages of a
-logical bitstream, the first page of an Ogg bitstream obeys
-additional restrictions. Each individual media mapping specifies its
-own implementation details regarding these restrictions.</p>
+<p>The first page of an elementary Ogg bitstream consists of a single,
+small 'initial header' packet that must include sufficient information
+to identify the exact CODEC type. From this initial header, the codec
+must also be able to determine its timebase and whether or not it is a
+continuous- or discontinuous-time stream.  The initial header must fit
+on a single page. If a codec makes use of auxiliary headers (for
+example, Vorbis uses two auxiliary headers), these headers must follow
+the initial header immediately.  The last header finishes its page;
+data begins on a fresh page.
 
-<p>The first page of a logical Ogg bitstream consists of a single,
-small 'initial header' packet that includes sufficient information to
-identify the exact CODEC type and media requirements of the logical
-bitstream. The intent of this restriction is to simplify identifying
-the bitstream type and content; for a given media type (or across all
-Ogg media types) we can know that we only need a small, fixed
-amount of data to uniquely identify the bitstream type.</p>
+<p>As an example, Ogg Vorbis places the name and revision of the
+Vorbis CODEC, the audio rate and the audio quality into this initial
+header.  Comments and detailed codec setup appears in the larger
+auxiliary headers.</p>
 
-<p>As an example, Ogg Vorbis places the name and revision of the Vorbis
-CODEC, the audio rate and the audio quality into this initial header,
-thus simplifying vastly the certain identification of an Ogg Vorbis
-audio bitstream.</p>
+<h2>Multiplexing Requirements</h2>
 
-<h3>sequential multiplexing (chaining)</h3>
+<p>Multiplexing requirements within Ogg are straightforward. When
+constructing a single-link (unchained) physical bitstream consisting
+of multiple elementary streams:
 
-<p>The simplest form of logical bitstream multiplexing is concatenation
-(<em>chaining</em>). Complete logical bitstreams are strung
-one-after-another in order. The bitstreams do not overlap; the final
-page of a given logical bitstream is immediately followed by the
-initial page of the next. Chaining is the only logical->physical
-mapping allowed by Ogg Vorbis.</p>
+<ol>
 
-<p>Each chained logical bitstream must have a unique serial number within
-the scope of the physical bitstream.</p>
+<li> The initial header for each stream appears in sequence, each
+header on a single page.  All initial headers must appear with no
+intervening data (no auxiliary header pages or packets, no data pages
+or packets).  Order of the initial headers is unspecified. The
+'beginning of stream' flag is set on each initial header.
 
-<h3>concurrent multiplexing (grouping)</h3>
+<li> All auxiliary headers for all streams must follow.  Order
+is unspecified.  The final auxiliary header of each stream must flush
+its page.
 
-<p>Logical bitstreams may also be multiplexed 'in parallel'
-(<em>grouped</em>). An example of grouping would be to allow
-streaming of separate audio and video streams, using different codecs
-and different logical bitstreams, in the same physical bitstream.
-Whole pages from multiple logical bitstreams are mixed together.</p>
+<li>Data pages for each stream follow, interleaved in time order. 
 
-<p>The initial pages of each logical bitstream must appear first; the
-media mapping specifies the order of the initial pages. For example,
-Ogg Theora describes video bitstream with audio.
-The mapping specifies that the physical bitstream must begin
-with the initial page of a logical video bitstream, followed by the
-initial page of an audio stream. Unlike initial pages, terminal pages
-for the logical bitstreams need not all occur contiguously (although a
-specific media mapping may require this; it is not mandated by the
-generic Ogg stream spec). Terminal pages may be 'nil' pages,
-that is, pages containing no content but simply a page header with
-position information and the 'last page of bitstream' flag set in the
-page header.</p>
+<li>The final page of each stream sets the 'end of stream' flag.
+Unlike initial pages, terminal pages for the logical bitstreams need
+not occur contiguously; indeed it may not be possible for them to do so.
+</oL>
 
 <p>Each grouped bitstream must have a unique serial number within the
 scope of the physical bitstream.</p>
 
-<h3>sequential and concurrent multiplexing</h3>
+<h3>chaining and multiplexing</h3>
 
-<p>Groups of concurrently multiplexed bitstreams may be chained
+<p>Multiplexed and/or unmultiplexed bitstreams may be chained
 consecutively. Such a physical bitstream obeys all the rules of both
-grouped and chained multiplexed streams; the groups, when unchained ,
-must stand on their own as a valid concurrently multiplexed
-bitstream.</p>
+chained and multiplexed streams.  Each link, when unchained, must
+stand on its own as a valid physical bitstream.  Chained streams do
+not mix; a new segment may not begin until all streams in the
+preceding segment have terminated. </p>
 
-<h3>multiplexing example</h3>
+<h2>Examples</h2>
 
+<em>[More to come shortly; this section is currently being revised and expanded]</em>
+
 <p>Below, we present an example of a grouped and chained bitstream:</p>
 
 <p><img src="stream.png" alt="stream"/></p>
@@ -227,7 +489,7 @@
   The Xiph Fish Logo is a
   trademark (&trade;) of Xiph.Org.<br/>
 
-  These pages &copy; 1994 - 2005 Xiph.Org. All rights reserved.
+  These pages &copy; 1994 - 2010 Xiph.Org. All rights reserved.
 </div>
 
 </body>



More information about the commits mailing list