[xiph-cvs] r6150 - theora/trunk/doc

giles at xiph.org giles at xiph.org
Sun Mar 21 16:38:04 PST 2004



Author: giles
Date: 2004-03-21 19:38:04 -0500 (Sun, 21 Mar 2004)
New Revision: 6150

Modified:
   theora/trunk/doc/spec.bib
   theora/trunk/doc/spec.tex
Log:
Additional sections from derf, plus a large number of text formatting
regularization changes. Few corrections as well.

<p>Modified: theora/trunk/doc/spec.bib
===================================================================
--- theora/trunk/doc/spec.bib	2004-03-21 22:49:08 UTC (rev 6149)
+++ theora/trunk/doc/spec.bib	2004-03-22 00:38:04 UTC (rev 6150)
@@ -15,23 +15,23 @@
 
 @MISC{oggstream,
   author="Christopher Montgomery",
-  title="Ogg logical and physical bitstream overview",
+  title="{Ogg} logical and physical bitstream overview",
   howpublished="\url{http://www.xiph.org/ogg/doc/oggstream.html}",
-  month="July",
+  month="Jul.",
   year=2002
 }
 
 @MISC{oggframe,
   author="Christopher Montgomery",
-  title="Ogg logical bitstream framing",
+  title="{Ogg} logical bitstream framing",
   howpublished="\url{http://www.xiph.org/ogg/doc/framing.html}",
-  month="July",
+  month="Jul.",
   year=2002
 }
 
 @MISC{rfc3533,
   author="Silvia Pfeiffer",
-  title="The Ogg Encapsulation Format Version 0",
+  title="{RFC} 3533: The {Ogg} Encapsulation Format Version 0",
   howpublished="\url{http://www.ietf.org/rfc/rfc3533.txt}",
   month="May",
   year=2003
@@ -39,7 +39,7 @@
 
 @MISC{rfc3534,
   author="Linus Walleij",
-  title="The application/ogg Media Type",
+  title="The {application/ogg} Media Type",
   howpublished="\url{http://www.ietf.org/rfc/rfc3534.txt}",
   month="May",
   year=2003
@@ -49,8 +49,6 @@
   author="H. Schulzrinne, S.  Casner, R. Frederick, V. Jacobson",
   title="RTP: A Transport Protocol for Real-Time Applications",
   howpublished="\url{http://www.ietf.org/rfc/rfc3550.txt}",
-  month="July",
+  month="Jul.",
   year=2003
 }
-
-

Modified: theora/trunk/doc/spec.tex
===================================================================
--- theora/trunk/doc/spec.tex	2004-03-21 22:49:08 UTC (rev 6149)
+++ theora/trunk/doc/spec.tex	2004-03-22 00:38:04 UTC (rev 6150)
@@ -9,6 +9,8 @@
 
 \newtheorem{theorem}{Theorem}[section]
 \newcommand{\qi}{\ensuremath{\mathit{qi}}}
+\newcommand{\ti}{\ensuremath{\mathit{ti}}}
+\newcommand{\term}[1]{{\em #1}}
 
 \pagestyle{headings}
 \bibliographystyle{alpha}
@@ -36,10 +38,10 @@
 Theora is a general purpose, lossy video codec.
 It is based on the VP3 video codec produced by On2 Technologies
  (\url{http://www.on2.com/}).
-On2 Technologies donated the VP3.2 source code to the Xiph.org 
-Foundation and it was released under a BSD license. On2 also made an 
-irrevocable, royalty-free license grant for any patent claims it might 
-have over the software and any derivatives.
+On2 Technologies donated the VP3.2 source code to the Xiph.org
+ Foundation and it was released under a BSD license.
+On2 also made an irrevocable, royalty-free license grant for any patent claims
+ it might have over the software and any derivatives.
 No formal specification exists for the VP3 format beyond this source code,
  though Mike Melanson maintains a detailed description \cite{Mel04}.
 Portions of this specification were adopted from his text with permission.
@@ -52,9 +54,10 @@
 %TODO: what about VP3.1 etc? source tables all say 'VP31'
 Theora content cannot, in general, be losslessly transcoded into the VP3
  format.
-If a feature is not available in the original VP3 format, this is
- mentioned when that feature is defined.
-A complete list of these features appears in Appendix~REF.
+If a feature is not available in the original VP3 format, this is mentioned
+ when that feature is defined.
+A complete list of these features appears in
+ Appendix~\ref{app:oggencapsulation}.
 
 \subsubsection{Video Formats}
 
@@ -68,12 +71,11 @@
 
 The Theora I format does not support interlaced material, bit-depths larger
  than 8 bits per component, nor alternate color spaces such as RGB or
- arbitrary multi-channel spaces. Black and white content can be 
-efficiently encoded because the uniform chromaticity planes compress 
-well.
+ arbitrary multi-channel spaces.
+Black and white content can be efficiently encoded, however, because the
+ uniform chroma planes compress well.
 Support for interlaced material is planned for a future version.
-Support for increased bit depths or additional color spaces is not 
-planned.
+Support for increased bit depths or additional color spaces is not planned.
 
 \subsubsection{Classification}
 
@@ -82,11 +84,11 @@
  compensation.
 This places it in the same class of codecs as MPEG-1, -2, -4, and H.263.
 The details of how individual blocks are organized and how DCT coefficients are
- organized in the bitstream differ stubstantially from these codecs, however.
+ organized in the bitstream differ substantially from these codecs, however.
 Theora supports only intra frames (I frames in MPEG) and inter frames (P frames
  in MPEG).
-There is no equivalent to the bi-predictive frames (B frames) 
-found in MPEG codecs.
+There is no equivalent to the bi-predictive frames (B frames) found in MPEG
+ codecs.
 
 \subsubsection{Assumptions}
 
@@ -95,11 +97,10 @@
 %TODO: Talk more about implementation complexity.
 
 Theora provides none of its own framing, synchronization, or protection against
- transmission errors; it is solely a method of accepting input video 
-frames and compressing
- these frames into raw, unformatted `packets'.
-The decoder then accepts these raw packets in sequence, decodes them,
- and synthesizes a fascimile of the original video frames.
+ transmission errors; it is solely a method of accepting input video frames and
+ compressing these frames into raw, unformatted `packets'.
+The decoder then accepts these raw packets in sequence, decodes them, and
+ synthesizes a fascimile of the original video frames.
 Theora is a free-form variable bit rate (VBR) codec, and packets have no
  minimum size, maximum size, or fixed/expected size.
 
@@ -111,18 +112,18 @@
  is embedded in an Ogg stream specifically, although this is by no means a
  requirement or fundamental assumption in the Theora design.
 
-The specification for embedding Theora into an Ogg transport stream is 
-given in
- Appendix~\ref{app:oggencapsulation}.
+The specification for embedding Theora into an Ogg transport stream is given in
+ Appendix~REF.
 
 \subsubsection{Codec Setup and Probability Model}
 
-Theora's heritage is the proprietary commerical codec VP3, and it retains
- a fair amount of inflexibility when compared to Vorbis\cite{vorbis}, 
- the first Xiph.org codec.
-However, to provide additional scope for encoder improvement,
-Theora adopts some of the configurable aspects of decoder setup that 
-are present in Vorbis.
+Theora's heritage is the proprietary commerical codec VP3, and it retains a
+ fair amount of inflexibility when compared to Vorbis \cite{vorbis}, the first
+ Xiph.org codec, which began as a research codec.
+However, to provide additional scope for encoder improvement, Theora adopts
+ some of the configurable aspects of decoder setup that are present in Vorbis.
+This configuration data is not available in VP3, which used hardcoded values
+ instead.
 
 Theora makes the same controversial design decision that Vorbis made to include
  the entire probability model for the DCT coefficients and all the quantization
@@ -139,19 +140,18 @@
 
 Thus, Theora headers are both required for decode to begin and relatively large
  as bitstream headers go.
-The header size is unbounded, although as a rule-of-thumb of less than 16kB 
- is recommended, and Xiph.org's reference encoder follows this 
- suggestion.
+The header size is unbounded, although as a rule-of-thumb less than 16kB is
+ recommended, and Xiph.org's reference encoder follows this suggestion.
 %TODO: Is 8kB enough? My setup header is 7.4kB, that doesn't leave much room
 % for comments.
-%RG: the lesson from vorbis is that as small as possible is really 
+%RG: the lesson from vorbis is that as small as possible is really
 % important in some applications. Practically, what's acceptable
 % depends a great deal on the target bitrate. I'd leave 16 kB in the
 % spec for now. fwiw more than 1k of comments is quite unusual.
 
 Our own design work indicates that the primary liability of the required header
  is in mindshare; it is an unusual design and thus causes some amount of
- complaint among engineers as this runs against current design trends, and
+ complaint among engineers as this runs against current design trends and
  points out limitations in some existing software/interface designs.
 However, we find that it does not fundamentally limit Theora's suitable
  application space.
@@ -164,7 +164,7 @@
 A decoder must faithfully and completely implement the specification defined
  herein %, except where noted,
  to be considered a proper Theora decoder.
-Where appropriate, a non-normative description of encoder processes are
+Where appropriate, a non-normative description of encoder processes is
  included.
 These sections will be marked as such, and a proper Theora encoder is not
  bound to follow them.
@@ -184,28 +184,16 @@
  pixel format, and a version number.
 The version number is divided into a major version, a minor version, amd a
  minor revision number.
-For the format defined in this specification, these are `3', `2', and 
- `0', respectively, in reference to Theora's origin as
- a successor to the VP3.1 format.
+For the format defined in this specification, these are `3', `2', and
+ `0', respectively, in reference to Theora's origin as a successor to the VP3.2
+ format.
 
-\subsubsection{Metadata}
-
-Following the global configuration fields is a metadata section.
-This consists of a {\em vendor string} identifying the producing encoder 
- implementation and a series of {\em tag, value} pairs holding a 
- human-readable description of the encoded content.
-The format of the metadata section is the same as that used in the 
- Vorbis I and Speex codecs.
-
-A complete description of how the metadata is decoded appears in 
-Section~REF, along with a suggested set of tags.
-
 \subsubsection{Quantization Matrices}
 
 Theora allows up to 384 different quantization matrices to be defined, one for
- each {\em quantization type} (intra or inter), {\em color plane} ($Y'$, $C_b$,
- or $C_r$), and {\em quantization index}, \qi, which ranges from zero to 63,
- inclusive.
+ each \term{quantization type} (intra or inter), \term{color plane}
+ ($Y'$, $C_b$, or $C_r$), and \term{quantization index}, \qi, which ranges from
+ zero to 63, inclusive.
 The quantization index generally represents a progressive range of quality
  levels, from low quality near zero to high quality near 63.
 However, the interpretation is arbitrary, and it is possible, for example, to
@@ -214,12 +202,12 @@
 
 Each quantization matrix is an $8\times 8$ matrix of 16-bit values, which is
  used to quantize the output of the $8\times 8$ DCT.
-Quantization matrices are specified using three components: a {\em base matrix}
- and two {\em scale values}.
-The first scale value is the {\em DC scale}, which is applied to the DC
+Quantization matrices are specified using three components: a
+ \term{base matrix} and two \term{scale values}.
+The first scale value is the \term{DC scale}, which is applied to the DC
  component of the base matrix.
-The second scale value is the {\em AC scale}, which is applied to all the other
- components of the base matrix.
+The second scale value is the \term{AC scale}, which is applied to all the
+ other components of the base matrix.
 
 There are 64 DC scale values and 64 AC scale values, one for each \qi value.
 There is a set of base matrices for each quantization type and each color
@@ -232,7 +220,7 @@
  non-linear processes of the human visual system as the \qi value varies.
 
 Finally, because the in-loop deblocking filter strength depends on the strength
- of the quantization matrices defined in this header, a table of 64 {\em loop
+ of the quantization matrices defined in this header, a table of 64 \term{loop
  filter limit values} is defined, one for each \qi value.
 
 The precise specification of how all of this information is decoded appears in
@@ -244,20 +232,432 @@
  used to encode DCT coefficients.
 Each of the 32 token values has a different semantic meaning and is used to
  represent single coefficient values, zero runs, combinations of the two, and
- `end-of-block' markers.
+ \term{End-Of-Block} markers.
 
 The 80 codes are divided up into five groups of 16, with each group
  corresponding to a set of DCT coefficient indices.
 The first group corresponds to the DC coefficient, while the remaining groups
  correspond to different subsets of the AC coefficients.
-Within each frame, two 4-bit codebook indices are stored.
-The first selects which codebook to use from the DC coefficient group, while
- the second selects which codebook to use from {\em all} of the AC coefficient
- groups.
+Within each frame, two pairs of 4-bit codebook indices are stored.
+The first pair selects which codebooks to use from the DC coefficient group for
+ the $Y'$ coefficients and the $C_b$ and $C_r$ coefficients.
+The second pair selects which codebooks to use from {\em all} of the AC
+ coefficient groups for the $Y'$ coefficients and the $C_b$ and $C_r$
+ coefficients.
 
 The precise specification of how the codebooks are decoded appears in
  Section~REF.
 
+\subsection{Coded Video Structure}
+
+Theora is based on $8\times 8$ blocks of pixels.
+This sections describes how a video frame is laid out, divided into blocks, and
+ how those blocks are organized.
+
+\subsubsection{Frame Layout}
+
+A video frame in Theora is a two-dimensional array of pixels.
+Theora, like VP3, uses a right-handed coordinate system, with the origin in the
+ lower-left corner of the frame.
+This is contrary to many video formats which use a left-handed coordinate
+ system with the origin in the upper-left corner of the frame.
+%INT: This means that for interlaced material, the definition of ``even fields"
+%INT:  and ``odd fields" may be reversed between Theora and other video codecs.
+%INT: This document will always refer to them as ``top fields" and ``bottom
+%INT:  fields".
+
+Theora divides the pixel array up into three separate \term{color planes}, one
+ for each of the $Y'$, $C_b$, and $C_r$ components of the pixel.
+The $Y'$ plane is also called the \term{luma plane}, and the $C_b$ and $C_r$
+ planes are also called the \term{chroma planes}.
+In some pixel formats, the chroma planes are decimated by two in one or both
+ directions.
+This means that the width or height of the chroma planes may be half that of
+ the total frame width and height, and thus only a multiple of eight, not
+ sixteen.
+The luma plane is never decimated.
+
+\subsubsection{Picture Region}
+
+A video frame in Theora is required to have a width and height that are
+ multiples of sixteen.
+However, inside a frame a smaller \term{picture region} may be defined.
+The picture region can be offset from the lower-left corner of the frame by up
+ to 255 pixels in each direction, and may have an arbitrary width and height,
+ provided that it is contained entirely within the coded frame.
+It is this picture region that contains the actual video data.
+The portions of the frame which lie outside the picture region may contain
+ arbitrary data, and should be cropped away after decode.
+The picture region plays no other role in the decode process, which operates on
+ the entire video frame.
+
+\subsubsection{Blocks and Super Blocks}
+
+Each color plane is subdivided into $8\times 8$ \term{blocks}.
+Blocks are grouped into $4\times 4$ arrays called \term{super blocks}.
+Each color plane has its own set of blocks and super blocks.
+The boundaries of the luma plane are not necessarily aligned with those of the
+ chroma planes, if the chroma planes have been decimated.
+
+Blocks are accessed in two different orders in the various decoder processes.
+The first is \term{raster order}.
+This indexes each block in row-major order, starting in the lower left and
+ proceeding along the bottom row, followed by the next row up starting on the
+ left, etc.
+The second is \term{coded order}.
+In coded order, blocks are accessed by super block.
+Each super block is traversed in raster order, similar to raster order for
+ blocks.
+Within each super block, however, blocks are accessed in a Hilbert curve
+ pattern, illustrated in Figure~REF.
+If a color plane does not contain a complete super block on the top or right
+ sides, the same ordering is still used, simply with any blocks outside the
+ frame boundary ommitted.
+
+%TODO: Figure
+%      X -> X    X -> X
+%           |    ^
+%           v    |
+%      X <- X    X <- X
+%      |              ^
+%      v              |
+%      X    X -> X    X
+%      |    ^    |    ^
+%      v    |    v    |
+%      X -> X    X -> X
+%But upside down.
+
+To illustrate these two orderings, consider a frame that is 240 pixels wide and
+ 48 pixels high.
+Thus each row of the luma plane has 30 blocks, 8 super blocks, and there are 6
+ rows of blocks and one row of super blocks.
+
+When accessed in raster order, each block in the luma plane is assigned the
+ following indices:
+
+\vspace{\baselineskip}
+\begin{tabular}{|l|l|l|l|c|l|l|}\hline
+150 & 151 & 152 & 153 & $\ldots$ & 178 & 179 \\\hline
+120 & 121 & 122 & 123 & $\ldots$ & 148 & 149 \\\hline
+ 90 &  91 &  92 &  93 & $\ldots$ & 118 & 119 \\\hline
+ 60 &  61 &  62 &  63 & $\ldots$ &  88 &  89 \\\hline
+ 30 &  31 &  32 &  33 & $\ldots$ &  58 &  59 \\\hline
+  0 &   1 &   2 &   3 & $\ldots$ &  28 &  29 \\\hline
+\end{tabular}
+\vspace{\baselineskip}
+
+When accessed in coded order, each block in the luma plane is assigned the
+ following indices:
+
+\vspace{\baselineskip}
+\begin{tabular}{|l|l|l|l|c|l|l|l|l|}\hline
+123 & 122 & 125 & 124 & $\ldots$ & 179 & 178 \\\hline
+120 & 121 & 126 & 127 & $\ldots$ & 176 & 177 \\\hline
+  5 &   6 &   9 &  10 & $\ldots$ & 117 & 118 \\\hline
+  4 &   7 &   8 &  11 & $\ldots$ & 116 & 119 \\\hline
+  3 &   2 &  13 &  12 & $\ldots$ & 115 & 114 \\\hline
+  0 &   1 &  14 &  15 & $\ldots$ & 112 & 113 \\\hline
+\end{tabular}
+\vspace{\baselineskip}
+
+Blocks in the chroma planes immediately follow those of the luma plane without
+ a break.
+
+\subsubsection{Macro Blocks}
+
+A macro block contains a $2\times 2$ array of blocks in the luma plane
+ {\em and} the co-located blocks in the chroma planes.
+Thus macro blocks can represent anywhere from six to twelve blocks, depending
+ on how the chroma planes are decimated.
+Macro blocks contain information about coding mode and motion vectors for the
+ corresponding blocks in all color planes.
+
+Macro blocks are also accessed in a \term{coded order}.
+This coded order proceeds be examining each super block in the luma plane in
+ raster order, and traversing the four macro blocks inside using a smaller
+ Hilbert curve, as shown in Figure~REF.
+If the luma plane does not contain a complete super block on the top or right
+ sides, the same ordering is still used, simply with any macro blocks outside
+ the frame boundary omitted.
+Because the frame size is constrained to be a multiple of 16, there are never
+ any partial macro blocks.
+Unlike blocks, macro blocks need never be accessed in a pure raster order.
+
+%TODO: Figure
+%    X -> X
+%    ^    |
+%    |    v
+%    X    X
+
+Using the same frame size as the example above, there are 15 macro blocks in
+ each row and 3 rows of macro blocks.
+They are assigned the following indices:
+
+\vspace{\baselineskip}
+\begin{tabular}{|l|l|c|l|}\hline
+30 & 31 & $\cdots$ & 44 \\\hline
+ 1 &  2 & $\cdots$ & 29 \\\hline
+ 0 &  3 & $\cdots$ & 28 \\\hline
+\end{tabular}
+\vspace{\baselineskip}
+
+\subsubsection{Coding Modes}
+
+Each block is coded using one of a small, fixed set of \term{coding modes} that
+ define how their contents are predicted.
+The INTRA mode uses no inter-frame prediction, and is the only mode allowed in
+ intra frames.
+The other coding modes use the contents of one of two different \term{reference
+ frames}.
+A reference frame is the fully decoded version of a previous frame in the
+ stream.
+The first available reference frame is the previous frame, whether it was an
+ intra frame or an inter frame.
+The second available reference frame is the previous intra frame, called the
+ \term{golden frame}.
+The most important inter coding mode is INTER\_NOMV, which uses the co-located
+ contents of the block in the previous frame as the predictor with no
+ motion-compensated prediction.
+
+\subsection{High-Level Decode Process}
+
+\subsubsection{Decoder Setup}
+
+Before decoding can begin, a decoder must be initialized using the bitstream
+ headers corresponding to the stream to be decoded.
+Theora uses three header packets; all are required, in order, by this
+ specification.
+Once set up, decode may begin at any intra-frame packet---or even inter-frame
+ packets, provided the appropriate decoded reference frames have been
+ cached---belonging to the Theora stream.
+In Theora I, all packets after the three initial headers are intra-frame or
+ inter-frame packets.
+
+The header packets are, in order, the identification header, the comment
+ header, and the setup header.
+
+\paragraph{Identification Header}
+
+The identification header identifies the stream as Theora, provides a version
+ number, and defines the characteristics of the video stream such as frame
+ size.
+A complete description of the identification header appears in Section~REF.
+
+\paragraph{Comment Header}
+
+The comment header includes user text comments (``tags") and a vendor string
+ for the application/library that produced the stream.
+The format of the comment header is the same as that used in the Vorbis I and
+ Speex codecs, with slight modifications due to the use of a different bit
+ packing mechanism.
+A complete description of how the comment header is coded appears in
+ Section~REF, along with a suggested set of tags.
+
+\paragraph{Setup Header}
+
+The setup header includes extensive codec setup information, including the
+ complete set of quantization matrices and Huffman codebooks needed to decode
+ the DCT coefficients.
+
+\subsubsection{Decode Procedure}
+
+The decoding and synthesis procedure for all video packets is fundamentally the
+ same, with some steps omitted for intra frames.
+\begin{enumerate}
+\item
+Decode packet type flag.
+\item
+Decode frame header.
+\item
+Decode coded block information (inter frames only).
+\item
+Decode macro block mode information (inter frames only).
+\item
+Decode motion vectors (inter frames only).
+\item
+Decode block-level \qi information.
+\item
+Decode DC coefficient for each coded block.
+\item
+Decode 1st AC coefficient for each coded block.
+\item
+Decode 2nd AC coefficient for each coded block.
+\item
+$\ldots$
+\item
+Decode 63rd AC coefficient for each coded block.
+\item Perform DC coefficient prediction.
+\item Reconstruct coded blocks.
+\item Copy uncoded bocks.
+\item Perform loop filtering.
+\end{enumerate}
+
+Note that clever rearrangement of the steps in this process is possible.
+As an example, in a memory-constrained environment, one can make multiple
+ passes through the DCT coefficients to avoid buffering them all in memory.
+On the first pass, the starting location of each coefficient is identified, and
+ then 64 separate get pointers are used to read in the 64 DCT coefficients
+ required to reconstruct each coded block in sequence.
+This operation produces entirely equivalent output and is naturally perfectly
+ legal.
+It may even be a benefit in non-memory-constrained environments due to a
+ reduced cache footprint.
+The decoder must be {\em entirely mathematically equivalent} to the
+ specification; it need not be a literal semantic implementation.
+
+Theora makes equivalence easy to check by defining all decoding operations in
+ terms of exact integer operations.
+No floating-point math is required, and in particular, the implementation of
+ the iDCT transform must be followed precisely.
+This prevents the decoder mismatch problem commonly associated with codecs that
+ provide a less rigorous transform specification.
+Such a mismatch problem would be devastating to Theora, since a single rounding
+ error in one frame could propagate throughout the entire succeeding frame due
+ to DC prediction.
+
+\paragraph{Packet Type Decode}
+
+Theora I uses four packet types.
+The first three packet types mark each of the three Theora headers described
+ above.
+The fourth packet type marks a video packet.
+All other packet types are reserved; packets marked with a reserved type should
+ be ignored.
+
+\paragraph{Frame Header Decode}
+
+The frame header contains some global information about the current frame.
+The first is the frame type field, which specifies if this is an intra frame or
+ an inter frame.
+Inter frames predict their contents from previously decoded reference frames.
+Intra frames can be independently decoded with no established reference frames.
+
+The next piece of information in the frame header is the list of \qi values
+ allowed in the frame.
+Theora allows between one and three different \qi values to be used in a single
+ frame, each of which selects a set of six quantization matrices, one for each
+ quantization type (inter or intra), and one for each color plane.
+The first \qi value is {\em always} used when dequantizing DC coefficients.
+The \qi value used when dequantizing AC coefficients, however, can vary from
+ block to block.
+VP3, in contrast, allowed just a single \qi value per frame for both the DC and
+ AC coefficients.
+
+\paragraph{Coded Block Information}
+
+This stage determines which blocks in the frame are coded and which are
+ uncoded.
+A \term{coded block list} is constructed which lists all the coded blocks in
+ coded order.
+For intra frames, every block is coded, and so no data needs to be read from
+ the packet.
+
+\paragraph{Macro Block Mode Information}
+
+For intra frames, every block is coded in INTRA mode, and this stage can be
+ skipped.
+In inter frames a \term{coded macro block list} is constructed from the coded
+ block list.
+Any macro block which has at least one of its luma blocks coded is considered
+ coded; all other macro blocks are uncoded, even if they contain coded chroma
+ blocks.
+A coding mode is decoded for each coded macro block, and assigned to all its
+ constituent coded blocks.
+All coded chroma blocks in uncoded macro blocks are assigned the INTER\_NOMV
+ coding mode.
+
+\paragraph{Motion Vectors}
+
+Intra frames are all coded entirely in INTRA mode, and so this stage can be
+ skipped.
+Some inter coding modes, however, require one or more motion vectors to be
+ specified for each macro block.
+These are decoded in this stage, and an appropriate motion vector is assigned
+ to each coded block in the macro block.
+
+\paragraph{Block-Level \qi Information}
+
+If a frame allows multiple \qi values, the \qi value assigned to each block is
+ decoded here.
+Frames that use only a single \qi value have nothing to decode.
+
+\paragraph{DCT Coefficients}
+
+Finally, the quantized DCT coefficients are decoded.
+DCT coefficients are represented by a list of tokens.
+Each token can take on one of 32 different values, each with a different
+ semantic meaning.
+A single token can represent a single DCT coefficient, a run of zero
+ coefficients within a single block, a combination of a run of zero
+ coefficients followed by a single non-zero coefficient, an
+ \term{End-Of-Block} marker, or a run of EOB markers.
+EOB markers signify that the remainder of the block is one long zero run.
+Unlike JPEG and MPEG, each block is not required to end with a special marker.
+If non-EOB tokens yield values for all 64 of the coefficients in a block, then
+ no EOB marker is needed.
+
+Each token is associated with a specific \term{token index} in a block.
+For single-coefficient tokens, this index is the index of the token in the
+ block.
+For zero-run tokens, this index is the index of the {\em first} coefficient in
+ the run.
+For combination tokens, the index is again the index of the first coefficient
+ in the zero run.
+For EOB markers, which signify that the remainder of the block is one long zero
+ run, the index is the first zero coefficient in that run.
+For EOB runs, the token index is that of the first EOB marker in the run.
+Due to zero runs and EOB markers, a block does not have to have a token for
+ every token index.
+
+Tokens are grouped in the stream by token index, not by the block they
+ originate from.
+This means that for each token index in turn, the tokens with that index from
+ {\em all} the coded blocks are coded in coded block order.
+When decoding, a current token index is maintained for each coded block.
+This index is advanced by the number of coefficients that are added to the
+ block as each token is decoded.
+After fully decoding all the tokens with token index \ti, the current token
+ index of every coded block will be \ti or greater.
+
+If an EOB run of $n$ blocks is decoded at token index \ti, then it ends the
+ next $n$ blocks in coded block order whose current token index is equal to
+ \ti, but not greater.
+If there are fewer than $n$ blocks with a current token index of \ti, then the
+ decoder goes through the coded block list again from the start, ending blocks
+ with a current token index of $\ti+1$, and so on, until $n$ blocks have been
+ ended or the current token index of every block is 64.
+
+Tokens are read by parsing a Huffman code that depends on \ti and the color
+ plane of the next coded block whose current token index is equal to \ti, but
+ not greater.
+The Huffman codebooks are selected on a per-frame basis from the 80 codebooks
+ defined in the setup header.
+Many tokens have a fixed number of \term{extra bits} associated with them.
+These bits are read directly after the token is decoded.
+These are used to define things such as coefficient magnitude, sign, and the
+ length of runs.
+
+\paragraph{DC Prediction}
+
+After the coefficients for each block are decoded, the quantized DC value of
+ each block is adjusted based on the DC values of its neighbors.
+This adjustment is performed by scanning the blocks in raster order, not coded
+ order.
+
+\paragraph{Reconstruction}
+
+Finally, using the coding mode, motion vector (if applicable), quantized
+ coefficient list, and \qi value defined for each block, all the coded blocks
+ are reconstructed.
+The DCT coefficients are dequantized, an inverse DCT transform is applied, and
+ a predictor is formed from the coding mode and motion vector and added to the
+ result.
+
+\paragraph{Loop Filtering}
+
+To complete the reconstructed frame, an in-loop deblocking filter is applied to
+ the edges of all coded blocks.
+
 \section{Bitpacking Convention}
 \label{sec:bitpacking}
 
@@ -284,7 +684,7 @@
 The most ubiquitous architectures today consider a `byte' to be an octet.
 Note, however, that the Theora bitpacking convention is still well defined for
  any native byte size;
-an implementation can use the native bit-width of a given storage 
+an implementation can use the native bit-width of a given storage
 system.
 This document assumes that a byte is one octet for purposes of example only.
 
@@ -349,7 +749,7 @@
 
 The binary integers decoded by the above process may be either signed or
  unsigned.
-This varies from integer to integer, and this specification 
+This varies from integer to integer, and this specification
  indicates how each value should be interpreted as it is read.
 That is, depending on context, the three bit binary pattern `b111' can be taken
  to represent either `$7$' as an unsigned integer or `$-1$' as a signed, two's
@@ -516,78 +916,80 @@
 
 \subsection{Overview}
 
-This document specifies the embedding or encapsulation of Theora packets 
+This document specifies the embedding or encapsulation of Theora packets
  in an Ogg transport stream.
 
-Ogg is a stream oriented wrapper for coded, linear time-based data. It 
- provides syncronization, multiplexing, framing, error detection and 
- seeking landmarks for the decoder and complements the raw packet format 
+Ogg is a stream oriented wrapper for coded, linear time-based data.
+It provides syncronization, multiplexing, framing, error detection and
+ seeking landmarks for the decoder and complements the raw packet format
  used by the Theora codec.
 
 This document assumes familiarity with the details of the Ogg standard.
-An overview of the Ogg transport stream format is given in 
- \cite{oggstream} and a detailed description is given in \cite{oggframe} 
- and \cite{rfc3533}.
-While Theora packets can be embedded in a wide variety of media 
- containers and streaming mechanisms, the Xiph.org Foundation 
- recommends Ogg as the native format for Theora video in file-oriented 
+An overview of the Ogg transport stream format is given in Xiph.org
+ documentation \cite{oggstream} and a detailed description is also given in
+ this documentation \cite{oggframe} and in RFC~3533 \cite{rfc3533}.
+While Theora packets can be embedded in a wide variety of media
+ containers and streaming mechanisms, the Xiph.org Foundation
+ recommends Ogg as the native format for Theora video in file-oriented
  storage and transmission contexts.
 
 \subsubsection{MIME type}
 
-The correct MIME type of any Ogg file is {\tt application/ogg}. 
-Outside of an encapsulation, the mime type {\tt video/x-theora} may 
-be used to refer specifically to the Theora compressed video stream. 
+The correct MIME type of any Ogg file is {\tt application/ogg}.
+Outside of an encapsulation, the mime type {\tt video/x-theora} may
+ be used to refer specifically to the Theora compressed video stream.
 
 \subsection{Embedding in a logical bitstream}
 
-Ogg separates a {\em logical bitstream} consisting of the framing of 
- a particular sequence of packets and complete within itself from 
- the {\em physical bitstream} which may consist either of a single 
- logical bitstream or a number of logical bitstreams multiplexed 
+Ogg separates a {\em logical bitstream} consisting of the framing of
+ a particular sequence of packets and complete within itself from
+ the {\em physical bitstream} which may consist either of a single
+ logical bitstream or a number of logical bitstreams multiplexed
  together.
-This section specifies the embedding of Theora packets in a logical Ogg 
- bitstream. The mapping of Ogg Theora logical bitstreams into a 
- multiplexed physical Ogg stream is described in the next section.
+This section specifies the embedding of Theora packets in a logical Ogg
+ bitstream.
+The mapping of Ogg Theora logical bitstreams into a multiplexed physical Ogg
+ stream is described in the next section.
 
 \subsubsection{Headers}
 
-The initial info header packet appears by itself in a single Ogg page. 
-This page defines the start of the logical stream and must have 
+The initial info header packet appears by itself in a single Ogg page.
+This page defines the start of the logical stream and must have
  the `beginning of stream' flag set.
 
-The second and third header packets (metadata comments and decoder 
+The second and third header packets (metadata comments and decoder
  setup data) can together span one or more Ogg pages.
-If there are additional non-normative header packets, they must be 
+If there are additional non-normative header packets, they must be
  included in this sequence of pages as well.
-The metadata header packet must begin the second Ogg page in the logical
+The comment header packet must begin the second Ogg page in the logical
  bitstream, and there must be a page break between the last header
  packet and the first frame data packet.
 
-These two page break requirements facilitate stream identification and 
-simplify header acquisition for seeking and live streaming applications.
+These two page break requirements facilitate stream identification and
+ simplify header acquisition for seeking and live streaming applications.
 
 All header pages must have their granule position field set to zero.
 %TODO: or -1?
+%TBT: What are we doing now?
 
 \subsubsection{Frame data}
 
-The first frame data packet in a logical bitstream must begin a fresh page. 
-All other data packets are placed one at a time into Ogg pages 
-until the end of the stream.
+The first frame data packet in a logical bitstream must begin a fresh page.
+All other data packets are placed one at a time into Ogg pages
+ until the end of the stream.
 Packets can span pages and multiple packets can be placed within any
  one page.
-The last page in the logical bitstream must have its `end of stream' 
+The last page in the logical bitstream must have its `end of stream'
  flag set.
 
-Frame data pages must be marked with a granule index corresponding to 
+Frame data pages must be marked with a granule index corresponding to
  the display time of the last frame/packet that finishes in that page.
 
 {\bf Note:}
-This scheme is still under discussion. It has also been 
- proposed that pages be labeled with a granule corresponding to the 
- first frame that begins on that page.
-This simplifies seeking and mux, but is different from the published 
+This scheme is still under discussion.
+It has also been proposed that pages be labeled with a granule corresponding to
+ the first frame that begins on that page.
+This simplifies seeking and mux, but is different from the published
  definition of the Ogg granule field.
 This document will be updated when the issue is settled.
 
@@ -595,13 +997,14 @@
 
 \subsection{Multiplexed stream mapping}
 
-Applications supporting Ogg Theora I must support Theora bitstreams 
- multiplexed with compressed audio data in the Vorbis I and Speex 
+Applications supporting Ogg Theora I must support Theora bitstreams
+ multiplexed with compressed audio data in the Vorbis I and Speex
  formats, and should support Ogg-encapsulated MNG graphics for overlays.
- % and the Writ format for text-based titling.
+% and the Writ format for text-based titling.
+%TBT: That's great... do these things have specifications?
 
 Multiple audio and video bitstreams may be multiplexed together.
-How playback of multiple/alternate streams is handled is up to the 
+How playback of multiple/alternate streams is handled is up to the
  application.
 Some conventions based on included metadata aide interoperability
  in this respect.
@@ -610,46 +1013,46 @@
 
 \subsubsection{Chained streams}
 
-Ogg Theora decoders and playback applications must support both grouped 
- streams (multiplexed concurrent logical streams) and chained streams 
+Ogg Theora decoders and playback applications must support both grouped
+ streams (multiplexed concurrent logical streams) and chained streams
  (sequential concatenation of independent physical bitstreams).
 
-The number and codec data types of multiplexed streams and the decoder 
- parameters for those stream types that re-occur can all change at a 
+The number and codec data types of multiplexed streams and the decoder
+ parameters for those stream types that re-occur can all change at a
  chaining boundary.
-A playback application must be prepared to handle such changes and 
+A playback application must be prepared to handle such changes and
  should do so smoothly with the minimum possible visible disruption.
-The specification of grouped streams below applies independently to each 
+The specification of grouped streams below applies independently to each
  segment of a chained bitstream.
 
 \subsubsection{Grouped streams}
 
-At the beginning of a multiplexed stream, the `beginning of stream' 
+At the beginning of a multiplexed stream, the `beginning of stream'
  pages for each logical bitstream will be grouped together.
 Within these, the first page to occur must be the Theora page.
-This facilitates identification of Ogg Theora files among other 
+This facilitates identification of Ogg Theora files among other
  Ogg-encapsulated content.
-A playback application must nevertheless handle streams where this 
- arrangement is not correct. 
+A playback application must nevertheless handle streams where this
+ arrangement is not correct.
 
-If there is more than one Theora logical stream, the first page should 
+If there is more than one Theora logical stream, the first page should
  be from the primary stream.
 That is, the best choice for the stream a generic player should begin
  displaying without special user direction.
-If there is more than one audio stream, or of any other stream 
+If there is more than one audio stream, or of any other stream
  type, the identification page of the primary stream of that type
  must be placed before the others.
 
-After the `beginning of stream' pages, the header pages of each of 
- the logical streams should be grouped together before any data pages 
+After the `beginning of stream' pages, the header pages of each of
+ the logical streams should be grouped together before any data pages
  occur.
 
-After all the header pages have been placed, 
+After all the header pages have been placed,
  the data pages are multiplexed together.
 They should be placed in the stream in increasing order by the playback
- time equivalents of their granule fields. This facilitates seeking 
- while limiting the buffering requirements of the playback 
- demultiplexer.
+ time equivalents of their granule fields.
+This facilitates seeking while limiting the buffering requirements of the
+ playback demultiplexer.
 
 \bibliography{spec}
 

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'cvs-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.



More information about the commits mailing list