[xiph-cvs] r6716 - trunk/theora/doc/spec

Mon May 17 21:50:04 PDT 2004

Author: silvia
Date: 2004-05-18 00:50:04 -0400 (Tue, 18 May 2004)
New Revision: 6716

Modified:
   trunk/theora/doc/spec/spec.tex
Log:
Changed over from article to book as style.

<p><p>Modified: trunk/theora/doc/spec/spec.tex
===================================================================

--- trunk/theora/doc/spec/spec.tex	2004-05-18 02:53:34 UTC (rev 6715)
+++ trunk/theora/doc/spec/spec.tex	2004-05-18 04:50:04 UTC (rev 6716)
@@ -1,4 +1,4 @@
-\documentclass[11pt,letterpaper]{article}
+\documentclass[11pt,letterpaper]{book}
 
 \usepackage{latexsym}
 \usepackage{amssymb}
@@ -43,9 +43,9 @@
 \newcommand{\ilog}{\ensuremath{\mathop{\mathrm{ilog}}\nolimits}}
 
 %Section-based table, figure, and equation numbering.
-\numberwithin{equation}{section}
-\numberwithin{figure}{section}
-\numberwithin{table}{section}
+\numberwithin{equation}{chapter}
+\numberwithin{figure}{chapter}
+\numberwithin{table}{chapter}
 
 \keepXColumns
 
@@ -59,19 +59,43 @@
 
 \begin{document}
 
+\begin{titlepage}
 \maketitle
+\end{titlepage}
+\thispagestyle{empty}
+\cleardoublepage
+
+\pagenumbering{roman}
+
+\thispagestyle{plain}
 \tableofcontents
-\newpage
+\cleardoublepage
+ 
+\thispagestyle{plain}
+\listoffigures
+\cleardoublepage
+                                                                                
+\thispagestyle{plain}
+\listoftables
+\cleardoublepage
 
-\section{Introduction and Description}
 
+                                                                                
+
+\pagenumbering{arabic}
+\setcounter{page}{1}
+                                                                                
+
+
+\chapter{Introduction and Description}
+
 This section provides a high level description of the Theora codec's
  construction.
 A bit-by-bit specification appears beginning in Section~\ref{sec:bitpacking}.
 The later sections assume a high-level understanding of the Theora decode
  process, which is provided below.
 
-\subsection{Overview}
+\section{Overview}
 
 Theora is a general purpose, lossy video codec.
 It is based on the VP3 video codec produced by On2 Technologies
@@ -84,7 +108,7 @@
  however Mike Melanson maintains a detailed description \cite{Mel04}.
 Portions of this specification were adopted from that text with permission.
 
-\subsubsection{VP3 and Theora}
+\subsection{VP3 and Theora}
 
 Theora contains a superset of the features that were available in the original
  VP3 codec.
@@ -96,7 +120,7 @@
  when that feature is defined.
 A complete list of these features appears in Appendix~REF.
 
-\subsubsection{Video Formats}
+\subsection{Video Formats}
 
 Theora I currently supports progressive video data of arbitrary dimensions at a
  constant frame rate in one of several $Y'C_bC_r$ color spaces.
@@ -118,7 +142,7 @@
  several Theora streams together.
 Support for increased bit depths or additional color spaces is not planned.
 
-\subsubsection{Classification}
+\subsection{Classification}
 
 Theora I is a block-based lossy transform codec that utilizes an
  $8\times 8$ Type-II Discrete Cosine Transform and block-based motion
@@ -131,7 +155,7 @@
 There is no equivalent to the bi-predictive frames (B frames) found in MPEG
  codecs.
 
-\subsubsection{Assumptions}
+\subsection{Assumptions}
 
 The Theora codec design assumes a complex, psychovisually-aware encoder and a
  simple, low-complexity decoder.
@@ -156,7 +180,7 @@
 The specification for embedding Theora into an Ogg transport stream is given in
  Appendix~\ref{app:oggencapsulation}.
 
-\subsubsection{Codec Setup and Probability Model}
+\subsection{Codec Setup and Probability Model}
 
 Theora's heritage is the proprietary commerical codec VP3, and it retains a
  fair amount of inflexibility when compared to Vorbis \cite{vorbis}, the first
@@ -197,7 +221,7 @@
 However, we find that it does not fundamentally limit Theora's suitable
  application space.
 
-\subsubsection{Format Specification}
+\subsection{Format Specification}
 
 The Theora format is well-defined by its decode specification; any encoder that
  produces packets that are correctly decoded by an implementation following
@@ -213,15 +237,15 @@
 These sections will be marked as such, and a proper Theora encoder is not
  bound to follow them.
 
-%TODO: \subsubsection{Hardware Profile}
+%TODO: \subsection{Hardware Profile}
 
-\subsection{Coded Video Structure}
+\section{Coded Video Structure}
 
 Theora is based on $8\times 8$ blocks of pixels.
 This sections describes how a video frame is laid out, divided into blocks, and
  how those blocks are organized.
 
-\subsubsection{Frame Layout}
+\subsection{Frame Layout}
 
 A video frame in Theora is a two-dimensional array of pixels.
 Theora, like VP3, uses a right-handed coordinate system, with the origin in the
@@ -259,7 +283,7 @@
  the total frame width and height.
 The luma plane is never subsampled.
 
-\subsubsection{Picture Region}
+\subsection{Picture Region}
 
 An encoded video frame in Theora is required to have a width and height that
  are multiples of sixteen, making an integral number of blocks even when the
@@ -285,7 +309,7 @@
 \label{fig:pic-frame}
 \end{figure}
 
-\subsubsection{Blocks and Super Blocks}
+\subsection{Blocks and Super Blocks}
 \label{sec:blocks-and-sbs}
 
 Each color plane is subdivided into $8\times 8$ \term{blocks}.
@@ -363,7 +387,7 @@
 The implication is that the blocks from all planes are treated as a unit during
  the various processing steps.
 
-\subsubsection{Macro Blocks}
+\subsection{Macro Blocks}
 \label{sec:mbs}
 
 A macro block contains a $2\times 2$ array of blocks in the luma plane
@@ -422,7 +446,7 @@
 \end{center}
 \vspace{\baselineskip}
 
-\subsubsection{Coding Modes and Prediction}
+\subsection{Coding Modes and Prediction}
 
 Each block is coded using one of a small, fixed set of \term{coding modes} that
  define how the block is predicted from previous frames.
@@ -445,7 +469,7 @@
  previous frame as the predictor.
 This is the default coding mode.
 
-\subsubsection{DCT Coefficients}
+\subsection{DCT Coefficients}
 
 To each block's predictor, a \term{residual} is added to form the final
  contents of the block.
@@ -502,14 +526,14 @@
 DCT coefficient $(0,0)$ is called the \term{DC coefficient}.
 All the other coefficients are called \term{AC coefficients}.
 
-\subsection{Decoder Configuration}
+\section{Decoder Configuration}
 
 Decoder setup consists of configuration of the quantization matrices and the
  Huffman codebooks for the DCT coefficients, and a table of limit values for
  the deblocking filter.
 The remainder of the decoding pipeline is not configurable.
 
-\subsubsection{Global Configuration}
+\subsection{Global Configuration}
 
 The global codec configuration consists of a few video related fields, such as
  frame rate, frame size, picture size and offset, aspect ratio, color space,
@@ -522,7 +546,7 @@
  `0', respectively, in reference to Theora's origin as a successor to the VP3.1
  format.
 
-\subsubsection{Quantization Matrices}
+\subsection{Quantization Matrices}
 
 Theora allows up to 384 different quantization matrices to be defined, one for
  each \term{quantization type}, \term{color plane} ($Y'$, $C_b$, or $C_r$), and
@@ -582,7 +606,7 @@
 The precise specification of how all of this information is decoded appears in
  Section~REF.
 
-\subsubsection{Huffman Codebooks}
+\subsection{Huffman Codebooks}
 
 Theora uses 80 configurable binary Huffman codes to represent the 32 tokens
  used to encode DCT coefficients.
@@ -604,9 +628,9 @@
 The precise specification of how the codebooks are decoded appears in
  Section~REF.
 
-\subsection{High-Level Decode Process}
+\section{High-Level Decode Process}
 
-\subsubsection{Decoder Setup}
+\subsection{Decoder Setup}
 
 Before decoding can begin, a decoder MUST be initialized using the bitstream
  headers corresponding to the stream to be decoded.
@@ -647,7 +671,7 @@
 A complete description of the setup header appears in
  Section~\ref{sec:setupheader}.
 
-\subsubsection{Decode Procedure}
+\subsection{Decode Procedure}
 
 The decoding and synthesis procedure for all video packets is fundamentally the
  same, with some steps omitted for intra frames.
@@ -845,7 +869,7 @@
 To complete the reconstructed frame, an ``in-loop" deblocking filter is applied to
  the edges of all coded blocks.
 
-\section{Notation and Conventions}
+\chapter{Notation and Conventions}
 
 All parameters either passed in or out of a decoding procedure are given in
  \bitvar{bold\ face}.
@@ -950,7 +974,7 @@
 
 \end{description}
 
-\subsection{Key words}
+\section{Key words}
 
 %We can't rewrite this, because this is text required by RFC 2119, so we use
 % some emergency stretching to get it typeset properly.
@@ -982,7 +1006,7 @@
 Such features will not increment the bitstream version number, and can only be
  recognized by checking the value of these reserved bits.
 
-\section{Video Formats}
+\chapter{Video Formats}
 
 This section gives a precise description of the video formats that Theora is
  capable of storing.
@@ -1006,7 +1030,7 @@
 The second describes the various schemes for sampling the color values in time
  and space.
 
-\subsection{Color Space Conventions}
+\section{Color Space Conventions}
 
 There are a large number of different color standards used in digital video.
 Since Theora is a lossy codec, it restricts itself to only a few of them to
@@ -1031,7 +1055,7 @@
 Currently, only two color spaces are defined, with a third possibility that
  indicates the color space is ``unknown".
 
-\subsection{Color Space Conversions and Parameters}
+\section{Color Space Conversions and Parameters}
 \label{sec:color-xforms}
 
 The parameters which describe the conversions between each color space are
@@ -1193,7 +1217,7 @@
 
 \end{description}
 
-\subsection{Available Color Spaces}
+\section{Available Color Spaces}
 \label{sec:colorspaces}
 
 These are the color spaces currently defined for use by Theora video.
@@ -1204,7 +1228,7 @@
 For these unspecified parameters, this document serves as the definition of
  what should be used when encoding or decoding Theora video.
 
-\subsubsection{Rec.~470M (Rec.~ITU-R~BT.470-6 System M/NTSC with
+\subsection{Rec.~470M (Rec.~ITU-R~BT.470-6 System M/NTSC with
  Rec.~ITU-R~BT.601-5)}
 \label{sec:470m}
 
@@ -1253,7 +1277,7 @@
 \label{tab:470m}
 \end{table}
 
-\subsubsection{Rec.~470BG (Rec.~ITU-R~BT.470-6 Systems B and G with
+\subsection{Rec.~470BG (Rec.~ITU-R~BT.470-6 Systems B and G with
  Rec.~ITU-R~BT.601-5)}
 \label{sec:470bg}
 
@@ -1317,13 +1341,13 @@
 \label{tab:470bg}
 \end{table}
 
-\subsection{Pixel Formats}
+\section{Pixel Formats}
 \label{sec:pixfmts}
 
 Theora supports several different pixel formats, each of which uses different
  subsampling for the chroma planes relative to the luma plane.
 
-\subsubsection{4:4:4 Subsampling}
+\subsection{4:4:4 Subsampling}
 \label{sec:444}
 
 All three color planes are stored at full resolution.
@@ -1340,7 +1364,7 @@
 %
 
 
-\subsubsection{4:2:2 Subsampling}
+\subsection{4:2:2 Subsampling}
 \label{sec:422}
 
 The $C_b$ and $C_r$ planes are stored with half the horizontal resolution of
@@ -1367,7 +1391,7 @@
 %
 %
 
-\subsubsection{4:2:0 Subsampling}
+\subsection{4:2:0 Subsampling}
 \label{sec:420}
 
 The $C_b$ and $C_r$ planes are stored with half the horizontal and half the
@@ -1408,7 +1432,7 @@
 %
 %
 
-\subsubsection{Subsampling and the Picture Region}
+\subsection{Subsampling and the Picture Region}
 
 Although the frame size must be an integral number of macro blocks, and thus
  both the number of pixels and the number of blocks in each direction must be
@@ -1444,10 +1468,10 @@
 
 %TODO: Figures!
 
-\section{Bitpacking Convention}
+\chapter{Bitpacking Convention}
 \label{sec:bitpacking}
 
-\subsection{Overview}
+\section{Overview}
 
 The Theora codec uses relatively unstructured raw packets containing
  binary integer fields of arbitrary width.
@@ -1459,7 +1483,7 @@
 The Theora bitpacking convention specifies the correct mapping of the logical
  packet bitstream into an actual representation in fixed-width units.
 
-\subsubsection{Octets and Bytes}
+\subsection{Octets and Bytes}
 
 In most contemporary architectures, a `byte' is synonymous with an `octect',
  that is, eight bits.
@@ -1473,7 +1497,7 @@
  given storage system.
 This document assumes that a byte is one octet for purposes of example only.
 
-\subsubsection{Words and Byte Order}
+\subsection{Words and Byte Order}
 
 A `word' is an integer size that is a grouped multiple of the byte size.
 Most architectures consider a word to be a group of two, four, or eight bytes.
@@ -1499,7 +1523,7 @@
 Logically, bytes are always encoded and decoded in order from byte zero through
  byte $n$.
 
-\subsubsection{Bit Order}
+\subsection{Bit Order}
 
 A byte has a well-defined `least significant' bit (LSb), which is the only bit
  set when the byte is storing the two's complement integer value $+1$.
@@ -1507,7 +1531,7 @@
 Bits in a byte are numbered from zero at the LSb to $n$ for the MSb, where
  $n=7$ in an octet.
 
-\subsection{Coding Bits into Bytes}
+\section{Coding Bits into Bytes}
 
 The Theora codec needs to encode arbitrary bit-width integers from zero to 32
  bits wide into packets.
@@ -1530,7 +1554,7 @@
 Any unfilled bits in the last byte of the packet MUST be cleared to zero by the
  encoder.
 
-\subsubsection{Signedness}
+\subsection{Signedness}
 
 The binary integers decoded by the above process may be either signed or
  unsigned.
@@ -1540,7 +1564,7 @@
  taken to represent either `$7$' as an unsigned integer or `$-1$' as a signed,
  two's complement integer.
 
-\subsubsection{Encoding Example}
+\subsection{Encoding Example}
 
 The following example shows the state of an (8-bit) byte stream after several
  binary integers are encoded, including the location of the put pointer for the
@@ -1615,7 +1639,7 @@
 \end{tabular}
 \vspace{\baselineskip}
 
-\subsubsection{Decoding Example}
+\subsection{Decoding Example}
 
 The following example shows the state of the (8-bit) byte stream encoded in the
  previous example after several binary integers are decoded, including the
@@ -1666,7 +1690,7 @@
  would have been the integer `$-1$'.
 \end{itemize}
 
-\subsubsection{End-of-Packet Alignment}
+\subsection{End-of-Packet Alignment}
 
 The typical use of bitpacking is to produce many independent byte-aligned
  packets which are embedded into a larger byte-aligned container structure,
@@ -1696,7 +1720,7 @@
  decoding, it may attempt to use the bits that were read to recover as much of
  encoded data as possible, signal a warning or error, or both.
 
-\subsubsection{Reading Zero Bit Integers}
+\subsection{Reading Zero Bit Integers}
 
 Reading a zero bit integer returns the value `$0$' and does not increment
  the stream pointer.
@@ -1707,7 +1731,7 @@
 Reading a zero bit integer after a previous read sets the `end-of-packet'
  condition shall fail, also returning `end-of-packet'.
 
-\section{Bitstream Headers}
+\chapter{Bitstream Headers}
 \label{sec:headers}
 
 A Theora bitstream begins with three header packets.
@@ -1727,7 +1751,7 @@
  streams.
 These are indicated as they appear in the sections below.
 
-\subsection{Common Header Decode}
+\section{Common Header Decode}
 \label{sub:common-header}
 
 \paragraph{Input parameters:} None.
@@ -1782,7 +1806,7 @@
 Packets with other header types (\hex{83}--\hex{FF}) are reserved and MUST be
  ignored.
 
-\subsection{Identification Header Decode}
+\section{Identification Header Decode}
 \label{sec:idheader}
 
 \paragraph{Input parameters:} None.
@@ -2033,7 +2057,7 @@
 VP3 headers do not specify a color space.
 VP3 only supports the 4:2:0 pixel format.
 
-\subsection{Comment Header}
+\section{Comment Header}
 \label{sec:commentheader}
 
 The Theora comment header is the second of three header packets that begin a
@@ -2061,7 +2085,7 @@
  also eight-bit clean with a length encoded in 32 bits.
 %TODO: The 1.0 release of libtheora sets the vendor string to ...
 
-\subsubsection{Comment Length Decode}
+\subsection{Comment Length Decode}
 \label{sub:comment-len}
 
 \paragraph{Input parameters:} None.
@@ -2111,7 +2135,7 @@
  conventions.
 \end{enumerate}
 
-\subsubsection{Comment Header Decode}
+\subsection{Comment Header Decode}
 
 \paragraph{Input parameters:} None.
 
@@ -2185,7 +2209,7 @@
 
 %TODO: \paragraph{VP3 Compatibility}
 
-\subsubsection{User Comment Format}
+\subsection{User Comment Format}
 
 The user comment vectors are structured similarly to a UNIX environment
  variable.
@@ -2250,7 +2274,7 @@
 %TODO: Complete list
 \end{description}
 
-\subsection{Setup Header}
+\section{Setup Header}
 \label{sec:setupheader}
 
 The Theora setup header contains the limit values used to drive the loop
@@ -2259,7 +2283,7 @@
 Because the contents of this header are specific to Theora, no concessions have
  been made to keep the fields octet-aligned for easy parsing.
 
-\subsubsection{Loop Filter Limit Table Decode}
+\subsection{Loop Filter Limit Table Decode}
 \label{sub:loop-filter-limits}
 
 \paragraph{Input parameters:} None.
@@ -2309,7 +2333,7 @@
 The loop filter limit values are hardcoded in VP3.
 The values used are given in Appendix~REF.
 
-\subsubsection{Quantization Parameters Decode}
+\subsection{Quantization Parameters Decode}
 \label{sub:quant-params}
 
 \paragraph{Input parameters:} None.
@@ -2526,7 +2550,7 @@
 The quantization parameters are hardcoded in VP3.
 The values used are given in Appendix~REF.
 
-\subsubsection{Computing a Quantization Matrix}
+\subsection{Computing a Quantization Matrix}
 \label{sub:quant-mat}
 
 \paragraph{Input parameters:}\hfill\\*
@@ -2687,7 +2711,7 @@
 \end{enumerate}
 \end{enumerate}
 
-\subsubsection{DCT Token Huffman Tables}
+\subsection{DCT Token Huffman Tables}
 \label{sub:huffman-tables}
 
 \paragraph{Input parameters:} None.
@@ -2792,7 +2816,7 @@
 The DCT token Huffman tables are hardcoded in VP3.
 The values used are given in Appendix~REF.
 
-\subsubsection{Setup Header Decode}
+\subsection{Setup Header Decode}
 
 \paragraph{Input parameters:} None.
 
@@ -2857,7 +2881,7 @@
  Section~\ref{sub:huffman-tables} into \bitvar{HTS}.
 \end{enumerate}
 
-\section{Frame Decode}
+\chapter{Frame Decode}
 
 This section describes the complete procedure necessary to decode a single
  frame.
@@ -2865,7 +2889,7 @@
  modes, motion vectors, block-level \qi\ values, and finally the DCT residual
  tokens, which are used to reconstruct the frame.
 
-\subsection{Frame Header Decode}
+\section{Frame Header Decode}
 \label{sub:frame-header}
 
 \paragraph{Input parameters:} None.
@@ -2968,14 +2992,14 @@
  because VP3 does not support block-level \qi\ values and uses the same
  \qi\ value for all the coefficients in a frame.
 
-\subsection{Run-Length Encoded Bit Strings}
+\section{Run-Length Encoded Bit Strings}
 
 Two variations of run-length encoding are used to store sequences of bits for
  the block coded flags and the block-level \qi\ values.
 The procedures to decode these bit sequences are specified in the following two
  sections.
 
-\subsubsection{Long-Run Bit String Decode}
+\subsection{Long-Run Bit String Decode}
 \label{sub:long-run}
 
 \paragraph{Input parameters:}\hfill\\*
@@ -3105,7 +3129,7 @@
  only format VP3 supports---this does not pose any problems because runs this
  long are not needed.
 
-\subsubsection{Short-Run Bit String Decode}
+\subsection{Short-Run Bit String Decode}
 \label{sub:short-run}
 
 \paragraph{Input parameters:}\hfill\\*
@@ -3209,7 +3233,7 @@
 Continue decoding runs from step~\ref{step:short-run-loop}.
 \end{enumerate}
 
-\subsection{Coded Block Flags Decode}
+\section{Coded Block Flags Decode}
 \label{sub:coded-blocks}
 
 \paragraph{Input parameters:}\hfill\\*
@@ -3366,7 +3390,7 @@
 \end{enumerate}
 \end{enumerate}
 
-\subsection{Macro Block Coding Modes}
+\section{Macro Block Coding Modes}
 
 \paragraph{Input parameters:}\hfill\\*
 \begin{tabularx}{\textwidth}{@{}llrcX@{}}\toprule
@@ -3522,10 +3546,10 @@
 \appendix
 
 \clearpage
-\section{Ogg Bitstream Encapsulation}
+\chapter{Ogg Bitstream Encapsulation}
 \label{app:oggencapsulation}
 
-\subsection{Overview}
+\section{Overview}
 
 This document specifies the embedding or encapsulation of Theora packets
  in an Ogg transport stream.
@@ -3545,13 +3569,13 @@
  recommends Ogg as the native format for Theora video in file-oriented
  storage and transmission contexts.
 
-\subsubsection{MIME type}
+\subsection{MIME type}
 
 The correct MIME type of any Ogg file is {\tt application/ogg}.
 Outside of an encapsulation, the mime type {\tt video/x-theora} may
  be used to refer specifically to the Theora compressed video stream.
 
-\subsection{Embedding in a logical bitstream}
+\section{Embedding in a logical bitstream}
 
 Ogg separates a {\em logical bitstream} consisting of the framing of
  a particular sequence of packets and complete within itself from
@@ -3563,7 +3587,7 @@
 The mapping of Ogg Theora logical bitstreams into a multiplexed physical Ogg
  stream is described in the next section.
 
-\subsubsection{Headers}
+\subsection{Headers}
 
 The initial info header packet appears by itself in a single Ogg page.
 This page defines the start of the logical stream and MUST have
@@ -3584,7 +3608,7 @@
 %TODO: or -1?
 %TBT: What are we doing now?
 
-\subsubsection{Frame data}
+\subsection{Frame data}
 
 The first frame data packet in a logical bitstream MUST begin a fresh page.
 All other data packets are placed one at a time into Ogg pages
@@ -3605,9 +3629,9 @@
  definition of the Ogg granule field.
 This document will be updated when the issue is settled.
 
-%TODO: \subsubsection{Granule position}
+%TODO: \subsection{Granule position}
 
-\subsection{Multiplexed stream mapping}
+\section{Multiplexed stream mapping}
 
 Applications supporting Ogg Theora I must support Theora bitstreams
  multiplexed with compressed audio data in the Vorbis I and Speex
@@ -3623,7 +3647,7 @@
 %TODO: describe multiple vs. alternate streams, language mapping
 % and reference metadata descriptions.
 
-\subsubsection{Chained streams}
+\subsection{Chained streams}
 
 Ogg Theora decoders and playback applications MUST support both grouped
  streams (multiplexed concurrent logical streams) and chained streams
@@ -3637,7 +3661,7 @@
 The specification of grouped streams below applies independently to each
  segment of a chained bitstream.
 
-\subsubsection{Grouped streams}
+\subsection{Grouped streams}
 
 At the beginning of a multiplexed stream, the `beginning of stream'
  pages for each logical bitstream will be grouped together.
@@ -3673,7 +3697,7 @@
 %TODO: The language should be changed to match.
 
 \clearpage
-\section{Colophon}
+\chapter{Colophon}
 
 Ogg is a \href{http://www.xiph.org}{Xiph.org Foundation} effort to protect
  essential tenets of Internet multimedia from corporate hostage-taking; Open

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'cvs-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.