[xiph-cvs] r6145 - theora/trunk/doc
giles at xiph.org
giles at xiph.org
Sat Mar 20 21:26:05 PST 2004
Author: giles
Date: 2004-03-21 00:26:04 -0500 (Sun, 21 Mar 2004)
New Revision: 6145
Modified:
theora/trunk/doc/spec.tex
Log:
Correct line endings.
<p>Modified: theora/trunk/doc/spec.tex
===================================================================
--- theora/trunk/doc/spec.tex 2004-03-21 05:24:19 UTC (rev 6144)
+++ theora/trunk/doc/spec.tex 2004-03-21 05:26:04 UTC (rev 6145)
@@ -1,493 +1,493 @@
-\documentclass[11pt,letterpaper]{article}
-
-\usepackage{latexsym}
-\usepackage{amssymb}
-\usepackage{amsmath}
-\usepackage{graphicx}
-\usepackage[pdfpagemode=None,pdfstartview=FitH,pdfview=FitH,colorlinks=true]%
- {hyperref}
-
-\newtheorem{theorem}{Theorem}[section]
-\newcommand{\qi}{\ensuremath{\mathit{qi}}}
-
-\pagestyle{headings}
-\bibliographystyle{alpha}
-
-\title{Theora I Specification}
-\author{Xiph.org Foundation}
-\date{\today}
-
-\begin{document}
-
-\maketitle
-\tableofcontents
-\newpage
-
-\section{Introduction and Description}
-
-This section provides a high level description of the Theora codec's
- construction.
-A bit-by-bit specification appears beginning in Section~\ref{sec:bitpacking}.
-The later sections assume a high-level understanding of the Theora decode
- process, which is provided below.
-
-\subsection{Overview}
-
-Theora is a general purpose, lossy video codec.
-It is based off the VP3.2 video codec produced by On2 Technologies
- (\url{http://www.on2.com/}).
-On2 Technologies has released the VP3.2 source code under a BSD license, along
- with an irrevocable, royalty-free license to any patent claims it might have
- over the software and any derivatives.
-No formal specification exists for the VP3.2 format beyond this source code,
- though Mike Melanson maintains a detailed description \cite{Mel04}.
-Portions of this specification were adopted from his text with permission.
-
-\subsubsection{VP3.2 and Theora}
-
-Theora contains a superset of the features that were available in the original
- VP3.2 codec.
-Content encoded with VP3.2 can be losslessly transcoded into the Theora format.
-Theora content cannot, in general, be losslessly transcoded into the VP3.2
- format.
-If a feature is not available in the original VP3.2 format, this will be
- mentioned when that feature is defined.
-A complete list of these features appears in Appendix~REF.
-
-\subsubsection{Video Formats}
-
-Theora I currently supports progressive video data of arbitrary dimensions in
- one of several 8-bit $Y'C_bC_r$ color spaces.
-The precise definition the color spaces supported appears in Section~REF.
-Three different chroma subsampling formats are supported: 4:2:0, 4:2:2,
- and 4:4:4.
-The precise details of each of these formats and their sampling locations are
- described in Section~REF.
-
-The Theora I format does not support interlaced material, bit-depths larger
- than 8 bits per component, nor alternate color spaces such as RGB or
- arbitrary multi-channel spaces.
-Support for interlaced material is planned for a future version.
-Support for increased bit depths or additional color spaces is not being
- considered as of the time of this writing.
-
-\subsubsection{Classification}
-
-Theora I is a block-based lossy transform codec that utilizes an
- $8\times 8$ Type-II Discrete Cosine Transform and block-based motion
- compensation.
-This places it in the same class of codecs as MPEG-1, -2, -4, and H.263.
-The details of how individual blocks are organized and how DCT coefficients are
- organized in the bitstream differ from these codecs substantially, however.
-Theora supports only intra frames (I frames in MPEG) and inter frames (P frames
- in MPEG).
-In the current version of Theora, there is no equivalent to the bi-predictive
- frames (B frames) found in MPEG codecs.
-
-\subsubsection{Assumptions}
-
-The Theora codec design assumes a complex, psychovisually-aware encoder and a
- simple, low-complexity decoder.
-%TODO: Talk more about implementation complexity.
-
-Theora provides none of its own framing, synchronization, or protection against
- errors; it is solely a method of accepting input video frames and compressing
- these frames into raw, unformatted `packets'.
-The decoder then accepts these raw packets in sequence, decodes them,
- and synthesizes a fascimile of the original video frames from them.
-Theora is a free-form variable bit rate (VBR) codec, and packets have no
- minimum size, maximum size, or fixed/expected size.
-
-Theora packets are thus intended to be used with a transport mechanism that
- provides free-form framing, synchronization, positioning, and error correction
- in accordance with these design assumptions, such as Ogg (for file transport)
- or RTP (for network multicast).
-For the purposes of a few examples in this document, we will assume that Theora
- is to embedded in an Ogg stream specifically, although this is by no means a
- requirement or fundamental assumption in the Theora design.
-
-The specification for embedding Theora into an Ogg transport stream is in
- Appendix~REF.
-
-\subsubsection{Codec Setup and Probability Model}
-
-Theora's heritage is the proprietary commerical codec VP3.2, and thus maintains
- a fair amount of inflexibility when compared to the first Xiph.org codec,
- Vorbis, which was designed as a research codec.
-However, in order to allow some room for encoder improvement, Theora adopts
- some of the configurable aspects of codec setup that are present in Vorbis.
-
-Theora makes the same controversial design decision that Vorbis made to include
- the entire probability model for the DCT coefficients and all the quantization
- parameters in the bitstream headers.
-This is often several hundred fields.
-This makes it impossible to begin decoding at any frame in the stream without
- having previously fetched the codec info and codec setup headers.
-
-\begin{verse}
-{\bf Note:} Theora {\em can} initiate decode at an arbitrary intra-frame packet
- within a bitstream so long as the codec has been initialized with the setup
- headers.
-\end{verse}
-
-Thus, Theora headers are both required for decode to begin and relatively large
- as bitstream headers go.
-The header size is unbouded, although for stream a rule-of-thumb of 16kB or
- less is recommended, and Xiph.org's Theora encoder follows this suggestion.
-%TODO: Is 8kB enough? My setup header is 7.4kB, that doesn't leave much room
-% for comments.
-
-Our own design work indicates that the primary liability of the required header
- is in mindshare; it is an unusual design and thus causes some amount of
- complaint among engineers as this runs against current design trends, and also
- points out limitations in some existing software/interface designs.
-However, we find that it does not fundamentally limit Theora's suitable
- application space.
-
-\subsubsection{Format Specification}
-
-The Theora format is well-defined by its decode specification; any encoder that
- produces packets that are correctly decoded by an implementation following
- this specification may be considered a proper Theora encoder.
-A decoder must faithfully and completely implement the specification defined
- herein %, except where noted,
- to be considered a proper Theora decoder.
-Where appropriate, a non-normative description of encoder processes may be
- described.
-These sections will be marked as such, and a proper Theora encoder is not
- bound to follow them.
-
-%TODO: \subsubsection{Hardware Profile}
-
-\subsection{Decoder Configuration}
-
-Decoder setup consists of configuration of the quantization matrices and the
- Huffman codebooks for the DCT coefficients.
-The remainder of the decoding pipeline is not configurable.
-
-\subsubsection{Global Configuration}
-
-The global codec configuration consists of a few video related fields, such as
- frame rate, frame size, picture size and offset, aspect ratio, color space,
- pixel format, and a version number.
-The version number is divided into a major version, a minor version, amd a
- minor revision number.
-These are `3', `2', and `0', respectively, due to Theora's origin as the VP3.2
- codec.
-
-\subsubsection{Quantization Matrices}
-
-Theora allows up to 384 different quantization matrices to be defined, one for
- each {\em quantization type} (intra or inter), {\em color plane} ($Y'$, $C_b$,
- or $C_r$), and {\em quantization index}, \qi, which ranges from zero to 63,
- inclusive.
-The quantization index generally represents a progressive range of quality
- levels, from low quality near zero to high quality near 63.
-However, the interpretation is arbitrary, and it is possible, for example, to
- partition the scale into two completely separate ranges with 32 levels each
- that are meant to represent different classes of source material.
-
-Each quantization matrix is an $8\times 8$ matrix of 16-bit values, which is
- used to quantize the output of the $8\times 8$ DCT.
-Quantization matrices are specified using three components: a {\em base matrix}
- and two {\em scale values}.
-The first scale value is the {\em DC scale}, which is applied to the DC
- component of the base matrix.
-The second scale value is the {\em AC scale}, which is applied to all the other
- components of the base matrix.
-
-There are 64 DC scale values and 64 AC scale values, one for each \qi value.
-There is a set of base matrices for each quantization type and each color
- plane.
-The bitstream specifies this set by defining a base matrix for a sparse subset
- of the possible \qi values, including at least zero and 63.
-The base matrices for the remainder of the \qi values are computed using linear
- interpolation.
-This configuration allows the quantization matrices to approximate the complex,
- non-linear processes of the human visual system as the \qi value varies.
-
-Finally, because the in-loop deblocking filter strength depends on the strength
- of the quantization matrices defined in this header, a table of 64 {\em loop
- filter limit values} is defined, one for each \qi value.
-
-The precise specification of how all of this information is decoded appears in
- Section~REF.
-
-\subsubsection{Huffman Codebooks}
-
-Theora uses 80 configurable binary Huffman codes to represent the 32 tokens
- used to encode DCT coefficients.
-Each of the 32 token values has a different semantic meaning and is used to
- represent single coefficient values, zero runs, combinations of the two, and
- `end-of-block' markers.
-
-The 80 codes are divided up into five groups of 16, with each group
- corresponding to a set of DCT coefficient indices.
-The first group corresponds to the DC coefficient, while the remaining groups
- correspond to different subsets of the AC coefficients.
-Within each frame, two 4-bit codebook indices are stored.
-The first selects which codebook to use from the DC coefficient group, while
- the second selects which codebook to use from {\em all} of the AC coefficient
- groups.
-
-The precise specification of how the codebooks are decoded appears in
- Section~REF.
-
-\section{Bitpacking Convention}
-\label{sec:bitpacking}
-
-\subsection{Overview}
-
-The Theora codec uses relatively unstructured raw packets containing
- binary integer fields of arbitrary width.
-Logically, each packet is a bitstream in which bits are written one-by-one by
- the encoder and then read one-by-one in the same order by the decoder.
-Most current binary storage arrangements group bits into a native storage unit
- of eight bits (octets), sixteen bits, thirty-two bits, or less commonly other
- fixed sizes.
-The Theora bitpacking convention specifies the correct mapping of the logical
- packet bitstream into an actual representation in fixed-width units.
-
-\subsubsection{Octets and Bytes}
-
-In most contemporary architectures, a `byte' is synonymous with an `octect',
- that is, eight bits.
-This has not always been the case; seven, ten, eleven, and sixteen bit `bytes'
- have been used.
-For purposes of the bitpacking convention, a byte implies the smallest native
- integer storage representation offered by a platform.
-On modern platforms, this is generally assumed to be eight bits, not
- necessarily because of the processor, but because of the file system/memory
- architecture.
-Modern file systems invariably offer bytes as the fundamental atom of storage.
-
-The most ubiquitous architectures today consider a `byte' to be an octet.
-Note, however, that the Theora bitpacking convention is still well defined for
- any native byte size;
-Theora uses the native bit-width of a given storage system.
-This document assumes that a byte is one octet for purposes of example only.
-
-\subsubsection{Words and Byte Order}
-
-A `word' is an integer size that is a grouped multiple of the byte size.
-Most architectures consider a word to be a group of two, four, or eight bytes.
-Each byte in the word can be ranked by order of `significance', e.g. the
- significance of the bits in each byte when storing a binary integer in the
- word.
-Several byte orderings are possible in a word.
-The common ones are
-\begin{itemize}
-\item{Big-endian:}
-in which the most significant byte comes first, e.g. 3-2-1-0,
-\item{Little-endian:}
-in which the least significant byte comes first, e.g. 0-1-2-3, and less
- commonly
-\item{Mixed-endian:}
-e.g. 3-1-2-0 or 0-2-1-3.
-\end{itemize}
-
-The Theora bitpacking convention specifies storage and bitstream manipulation
- at the byte, not word, level.
-Thus host word ordering is of a concern only during optimization, when writing
- code that operates on a word of storage at a time rather than a byte.
-Logically, bytes are always encoded and decoded in order from byte zero through
- byte $n$.
-
-\subsubsection{Bit Order}
-
-A byte has a well-defined `least significant' bit (LSb), which is the only bit
- set when the byte is storing the two's complement integer value $+1$.
-A byte's `most significant' bit (MSb) is at the opposite end of the byte.
-Bits in a byte are numbered from zero at the LSb to $n$ for the MSb, where
- $n=7$ in an octet.
-
-\subsection{Coding Bits into Bytes}
-
-The Theora codec needs to encode arbitrary bit-width integers from zero to 32
- bits wide into packets.
-These integer fields are not aligned to the boundaries of the byte
- representation; the next field is read at the bit position immediately
- after the end of the previous field.
-
-The decoder logically unpacks integers by first reading the MSb of a binary
- integer from the logical bitstream, followed by the next least significant
- bit, etc., until the requested number of bits have been read.
-When unpacking the bytes into bits, the decoder begins by reading the MSb of
- the integer to be read from the most significant unread bit position of the
- source byte, followed by the next-most significant bit position of the
- destination integer, and so on up to the requested number of bits.
-Note that this is a change from the earlier Xiph.org codec, Vorbis I, which
- begins encoding from the LSb of the source integer, reading it from the LSb of
- the source byte.
-When all the bits of the current source byte are read, decoding continues with
- the MSb of the next byte.
-Any unfilled bits in the last byte of the packet must be cleared to zero by the
- encoder.
-
-\subsubsection{Signedness}
-
-The binary integers decoded by the above process may be either signed or
- unsigned.
-This may change from integer to integer, and this document will specify how
- each should be interpreted as it is read.
-That is, depending on context, the three bit binary pattern `b111' can be taken
- to represent either `$7$' as an unsigned integer or `$-1$' as a signed, two's
- complement integer.
-
-\subsubsection{Encoding Example}
-
-The following example shows the state of an (8-bit) byte stream after several
- binary integers are encoded, including the location of the put pointer for the
- next bit to write to and the total length of the stream in bytes.
-
-Encode the 4 bit unsigned integer value `12' (b1100) into an empty byte stream.
-
-\begin{tabular}{rccccccccl}
- & & & & &$\downarrow$&&&& \\
- \hfill\vline& 7 & 6 & 5 & 4 & 3 & 2 & 1 & 0 & \\\cline{1-9}
-byte 0 \hfill\vline& 1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 &$\leftarrow$\\
-byte 1 \hfill\vline& 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \\
-byte 2 \hfill\vline& 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \\
-byte 3 \hfill\vline& 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \\
-\hfill$\vdots$\hfill\vline&\multicolumn{8}{c}{$\vdots$}& \\
-byte $n$\hfill\vline& 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 &
-byte stream length: 1 byte
-\end{tabular}
-\vspace{\baselineskip}
-
-Continue by encoding the 3 bit signed integer value `-1' (b111).
-
-\begin{tabular}{rccccccccl}
- & & & & & & & &$\downarrow$& \\
- \hfill\vline& 7 & 6 & 5 & 4 & 3 & 2 & 1 & 0 & \\\cline{1-9}
-byte 0 \hfill\vline& 1 & 1 & 0 & 0 & 1 & 1 & 1 & 0 &$\leftarrow$\\
-byte 1 \hfill\vline& 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \\
-byte 2 \hfill\vline& 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \\
-byte 3 \hfill\vline& 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \\
-\hfill$\vdots$\hfill\vline&\multicolumn{8}{c}{$\vdots$}& \\
-byte $n$\hfill\vline& 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 &
-byte stream length: 1 byte
-\end{tabular}
-\vspace{\baselineskip}
-
-Continue by encoding the 7 bit integer value `17' (b0010001).
-
-\begin{tabular}{rccccccccl}
- & & & & & & &$\downarrow$&& \\
- \hfill\vline& 7 & 6 & 5 & 4 & 3 & 2 & 1 & 0 & \\\cline{1-9}
-byte 0 \hfill\vline& 1 & 1 & 0 & 0 & 1 & 1 & 1 & 0 & \\
-byte 1 \hfill\vline& 0 & 1 & 0 & 0 & 0 & 1 & 0 & 0 &$\leftarrow$\\
-byte 2 \hfill\vline& 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \\
-byte 3 \hfill\vline& 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \\
-\hfill$\vdots$\hfill\vline&\multicolumn{8}{c}{$\vdots$}& \\
-byte $n$\hfill\vline& 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 &
-byte stream length: 2 bytes
-\end{tabular}
-\vspace{\baselineskip}
-
-Continue by encoding the 13 bit integer value `6969' (b11011 00111001).
-
-\begin{tabular}{rccccccccl}
- & & & &$\downarrow$&&&& & \\
- \hfill\vline& 7 & 6 & 5 & 4 & 3 & 2 & 1 & 0 & \\\cline{1-9}
-byte 0 \hfill\vline& 1 & 1 & 0 & 0 & 1 & 1 & 1 & 0 & \\
-byte 1 \hfill\vline& 0 & 1 & 0 & 0 & 0 & 1 & 1 & 1 & \\
-byte 2 \hfill\vline& 0 & 1 & 1 & 0 & 0 & 1 & 1 & 1 & \\
-byte 3 \hfill\vline& 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 &$\leftarrow$\\
-\hfill$\vdots$\hfill\vline&\multicolumn{8}{c}{$\vdots$}& \\
-byte $n$\hfill\vline& 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 &
-byte stream length: 4 bytes
-\end{tabular}
-\vspace{\baselineskip}
-
-\subsubsection{Decoding Example}
-
-The following example shows the state of the (8-bit) byte stream encoded in the
- previous example after several binary integers are decoded, including the
- location of the get pointer for the next bit to read.
-
-Read a two bit unsigned integer from the example encoded above.
-
-\begin{tabular}{rccccccccl}
- & & &$\downarrow$&&&& & & \\
- \hfill\vline& 7 & 6 & 5 & 4 & 3 & 2 & 1 & 0 & \\\cline{1-9}
-byte 0 \hfill\vline& 1 & 1 & 0 & 0 & 1 & 1 & 1 & 0 &$\leftarrow$\\
-byte 1 \hfill\vline& 0 & 1 & 0 & 0 & 0 & 1 & 1 & 1 & \\
-byte 2 \hfill\vline& 0 & 1 & 1 & 0 & 0 & 1 & 1 & 1 & \\
-byte 3 \hfill\vline& 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 &
-byte stream length: 4 bytes
-\end{tabular}
-\vspace{\baselineskip}
-
-Value read: 3 (b11).
-
-Read another two bit unsigned integer from the example encoded above.
-
-\begin{tabular}{rccccccccl}
- & & & & &$\downarrow$&&& & \\
- \hfill\vline& 7 & 6 & 5 & 4 & 3 & 2 & 1 & 0 & \\\cline{1-9}
-byte 0 \hfill\vline& 1 & 1 & 0 & 0 & 1 & 1 & 1 & 0 &$\leftarrow$\\
-byte 1 \hfill\vline& 0 & 1 & 0 & 0 & 0 & 1 & 1 & 1 & \\
-byte 2 \hfill\vline& 0 & 1 & 1 & 0 & 0 & 1 & 1 & 1 & \\
-byte 3 \hfill\vline& 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 &
-byte stream length: 4 bytes
-\end{tabular}
-\vspace{\baselineskip}
-
-Value read: 0 (b00).
-
-Two things are worth noting here.
-\begin{itemize}
-\item
-Although these four bits were originally written as a single four-bit integer,
- reading some other combination of bit-widths from the bitstream is well
- defined.
-There are no artificial alignment boundaries maintained in the bitstream.
-\item
-The first value is the integer `$3$' only because the context stated we were
- reading an unsigned integer.
-Had the context stated we were reading a signed integer, the returned value
- would have been the integer `$-1$'.
-\end{itemize}
-
-\subsubsection{End-of-Packet Alignment}
-
-The typical use of bitpacking is to produce many independent byte-aligned
- packets which are embedded into a larger byte-aligned container structure,
- such as an Ogg transport bitstream.
-Externally, each bitstream encoded as a byte stream must begin and end on a
- byte boundary.
-Often, the encoded bitstream is not an integer number of bytes, and so there is
- unused space in the last byte of a packet.
-
-Unused space in the last byte of a packet is always zeroed during the encoding
- process.
-Thus, should this unused space be read, it will return binary zeroes.
-There is no marker pattern or stuffing bits that will allow the decoder to
- obtain the exact size, in bits, of the original bitstream.
-This knowledge is not required for decoding.
-
-Attempting to read past the end of an encoded packet results in an
- `end-of-packet' condition.
-Any further read operations after an `end-of-packet' condition shall also
- return `end-of-packet'.
-Unlike Vorbis, Theora does not use truncated packets as a normal mode of
- operation.
-Therefore if a decoder encounters the `end-of-packet' condition during normal
- decoding, it may attempt to use the bits that were read to recover as much of
- encoded data as possible, signal a warning or error, or both.
-
-\subsubsection{Reading Zero Bit Integers}
-
-Reading a zero bit integer returns the value `$0$' and does not increment
- the stream pointer.
-Reading to the end of the packet, but not past the end, so that an
- `end-of-packet' condition is not triggered, and then reading a zero bit
- integer shall succeed, returning `$0$', and not trigger an `end-of-packet'
- condition.
-Reading a zero bit integer after a previous read sets the `end-of-packet'
- condition shall fail, also returning `end-of-packet'.
-
-\bibliography{spec}
-
-\end{document}
+\documentclass[11pt,letterpaper]{article}
+
+\usepackage{latexsym}
+\usepackage{amssymb}
+\usepackage{amsmath}
+\usepackage{graphicx}
+\usepackage[pdfpagemode=None,pdfstartview=FitH,pdfview=FitH,colorlinks=true]%
+ {hyperref}
+
+\newtheorem{theorem}{Theorem}[section]
+\newcommand{\qi}{\ensuremath{\mathit{qi}}}
+
+\pagestyle{headings}
+\bibliographystyle{alpha}
+
+\title{Theora I Specification}
+\author{Xiph.org Foundation}
+\date{\today}
+
+\begin{document}
+
+\maketitle
+\tableofcontents
+\newpage
+
+\section{Introduction and Description}
+
+This section provides a high level description of the Theora codec's
+ construction.
+A bit-by-bit specification appears beginning in Section~\ref{sec:bitpacking}.
+The later sections assume a high-level understanding of the Theora decode
+ process, which is provided below.
+
+\subsection{Overview}
+
+Theora is a general purpose, lossy video codec.
+It is based off the VP3.2 video codec produced by On2 Technologies
+ (\url{http://www.on2.com/}).
+On2 Technologies has released the VP3.2 source code under a BSD license, along
+ with an irrevocable, royalty-free license to any patent claims it might have
+ over the software and any derivatives.
+No formal specification exists for the VP3.2 format beyond this source code,
+ though Mike Melanson maintains a detailed description \cite{Mel04}.
+Portions of this specification were adopted from his text with permission.
+
+\subsubsection{VP3.2 and Theora}
+
+Theora contains a superset of the features that were available in the original
+ VP3.2 codec.
+Content encoded with VP3.2 can be losslessly transcoded into the Theora format.
+Theora content cannot, in general, be losslessly transcoded into the VP3.2
+ format.
+If a feature is not available in the original VP3.2 format, this will be
+ mentioned when that feature is defined.
+A complete list of these features appears in Appendix~REF.
+
+\subsubsection{Video Formats}
+
+Theora I currently supports progressive video data of arbitrary dimensions in
+ one of several 8-bit $Y'C_bC_r$ color spaces.
+The precise definition the color spaces supported appears in Section~REF.
+Three different chroma subsampling formats are supported: 4:2:0, 4:2:2,
+ and 4:4:4.
+The precise details of each of these formats and their sampling locations are
+ described in Section~REF.
+
+The Theora I format does not support interlaced material, bit-depths larger
+ than 8 bits per component, nor alternate color spaces such as RGB or
+ arbitrary multi-channel spaces.
+Support for interlaced material is planned for a future version.
+Support for increased bit depths or additional color spaces is not being
+ considered as of the time of this writing.
+
+\subsubsection{Classification}
+
+Theora I is a block-based lossy transform codec that utilizes an
+ $8\times 8$ Type-II Discrete Cosine Transform and block-based motion
+ compensation.
+This places it in the same class of codecs as MPEG-1, -2, -4, and H.263.
+The details of how individual blocks are organized and how DCT coefficients are
+ organized in the bitstream differ from these codecs substantially, however.
+Theora supports only intra frames (I frames in MPEG) and inter frames (P frames
+ in MPEG).
+In the current version of Theora, there is no equivalent to the bi-predictive
+ frames (B frames) found in MPEG codecs.
+
+\subsubsection{Assumptions}
+
+The Theora codec design assumes a complex, psychovisually-aware encoder and a
+ simple, low-complexity decoder.
+%TODO: Talk more about implementation complexity.
+
+Theora provides none of its own framing, synchronization, or protection against
+ errors; it is solely a method of accepting input video frames and compressing
+ these frames into raw, unformatted `packets'.
+The decoder then accepts these raw packets in sequence, decodes them,
+ and synthesizes a fascimile of the original video frames from them.
+Theora is a free-form variable bit rate (VBR) codec, and packets have no
+ minimum size, maximum size, or fixed/expected size.
+
+Theora packets are thus intended to be used with a transport mechanism that
+ provides free-form framing, synchronization, positioning, and error correction
+ in accordance with these design assumptions, such as Ogg (for file transport)
+ or RTP (for network multicast).
+For the purposes of a few examples in this document, we will assume that Theora
+ is to embedded in an Ogg stream specifically, although this is by no means a
+ requirement or fundamental assumption in the Theora design.
+
+The specification for embedding Theora into an Ogg transport stream is in
+ Appendix~REF.
+
+\subsubsection{Codec Setup and Probability Model}
+
+Theora's heritage is the proprietary commerical codec VP3.2, and thus maintains
+ a fair amount of inflexibility when compared to the first Xiph.org codec,
+ Vorbis, which was designed as a research codec.
+However, in order to allow some room for encoder improvement, Theora adopts
+ some of the configurable aspects of codec setup that are present in Vorbis.
+
+Theora makes the same controversial design decision that Vorbis made to include
+ the entire probability model for the DCT coefficients and all the quantization
+ parameters in the bitstream headers.
+This is often several hundred fields.
+This makes it impossible to begin decoding at any frame in the stream without
+ having previously fetched the codec info and codec setup headers.
+
+\begin{verse}
+{\bf Note:} Theora {\em can} initiate decode at an arbitrary intra-frame packet
+ within a bitstream so long as the codec has been initialized with the setup
+ headers.
+\end{verse}
+
+Thus, Theora headers are both required for decode to begin and relatively large
+ as bitstream headers go.
+The header size is unbouded, although for stream a rule-of-thumb of 16kB or
+ less is recommended, and Xiph.org's Theora encoder follows this suggestion.
+%TODO: Is 8kB enough? My setup header is 7.4kB, that doesn't leave much room
+% for comments.
+
+Our own design work indicates that the primary liability of the required header
+ is in mindshare; it is an unusual design and thus causes some amount of
+ complaint among engineers as this runs against current design trends, and also
+ points out limitations in some existing software/interface designs.
+However, we find that it does not fundamentally limit Theora's suitable
+ application space.
+
+\subsubsection{Format Specification}
+
+The Theora format is well-defined by its decode specification; any encoder that
+ produces packets that are correctly decoded by an implementation following
+ this specification may be considered a proper Theora encoder.
+A decoder must faithfully and completely implement the specification defined
+ herein %, except where noted,
+ to be considered a proper Theora decoder.
+Where appropriate, a non-normative description of encoder processes may be
+ described.
+These sections will be marked as such, and a proper Theora encoder is not
+ bound to follow them.
+
+%TODO: \subsubsection{Hardware Profile}
+
+\subsection{Decoder Configuration}
+
+Decoder setup consists of configuration of the quantization matrices and the
+ Huffman codebooks for the DCT coefficients.
+The remainder of the decoding pipeline is not configurable.
+
+\subsubsection{Global Configuration}
+
+The global codec configuration consists of a few video related fields, such as
+ frame rate, frame size, picture size and offset, aspect ratio, color space,
+ pixel format, and a version number.
+The version number is divided into a major version, a minor version, amd a
+ minor revision number.
+These are `3', `2', and `0', respectively, due to Theora's origin as the VP3.2
+ codec.
+
+\subsubsection{Quantization Matrices}
+
+Theora allows up to 384 different quantization matrices to be defined, one for
+ each {\em quantization type} (intra or inter), {\em color plane} ($Y'$, $C_b$,
+ or $C_r$), and {\em quantization index}, \qi, which ranges from zero to 63,
+ inclusive.
+The quantization index generally represents a progressive range of quality
+ levels, from low quality near zero to high quality near 63.
+However, the interpretation is arbitrary, and it is possible, for example, to
+ partition the scale into two completely separate ranges with 32 levels each
+ that are meant to represent different classes of source material.
+
+Each quantization matrix is an $8\times 8$ matrix of 16-bit values, which is
+ used to quantize the output of the $8\times 8$ DCT.
+Quantization matrices are specified using three components: a {\em base matrix}
+ and two {\em scale values}.
+The first scale value is the {\em DC scale}, which is applied to the DC
+ component of the base matrix.
+The second scale value is the {\em AC scale}, which is applied to all the other
+ components of the base matrix.
+
+There are 64 DC scale values and 64 AC scale values, one for each \qi value.
+There is a set of base matrices for each quantization type and each color
+ plane.
+The bitstream specifies this set by defining a base matrix for a sparse subset
+ of the possible \qi values, including at least zero and 63.
+The base matrices for the remainder of the \qi values are computed using linear
+ interpolation.
+This configuration allows the quantization matrices to approximate the complex,
+ non-linear processes of the human visual system as the \qi value varies.
+
+Finally, because the in-loop deblocking filter strength depends on the strength
+ of the quantization matrices defined in this header, a table of 64 {\em loop
+ filter limit values} is defined, one for each \qi value.
+
+The precise specification of how all of this information is decoded appears in
+ Section~REF.
+
+\subsubsection{Huffman Codebooks}
+
+Theora uses 80 configurable binary Huffman codes to represent the 32 tokens
+ used to encode DCT coefficients.
+Each of the 32 token values has a different semantic meaning and is used to
+ represent single coefficient values, zero runs, combinations of the two, and
+ `end-of-block' markers.
+
+The 80 codes are divided up into five groups of 16, with each group
+ corresponding to a set of DCT coefficient indices.
+The first group corresponds to the DC coefficient, while the remaining groups
+ correspond to different subsets of the AC coefficients.
+Within each frame, two 4-bit codebook indices are stored.
+The first selects which codebook to use from the DC coefficient group, while
+ the second selects which codebook to use from {\em all} of the AC coefficient
+ groups.
+
+The precise specification of how the codebooks are decoded appears in
+ Section~REF.
+
+\section{Bitpacking Convention}
+\label{sec:bitpacking}
+
+\subsection{Overview}
+
+The Theora codec uses relatively unstructured raw packets containing
+ binary integer fields of arbitrary width.
+Logically, each packet is a bitstream in which bits are written one-by-one by
+ the encoder and then read one-by-one in the same order by the decoder.
+Most current binary storage arrangements group bits into a native storage unit
+ of eight bits (octets), sixteen bits, thirty-two bits, or less commonly other
+ fixed sizes.
+The Theora bitpacking convention specifies the correct mapping of the logical
+ packet bitstream into an actual representation in fixed-width units.
+
+\subsubsection{Octets and Bytes}
+
+In most contemporary architectures, a `byte' is synonymous with an `octect',
+ that is, eight bits.
+This has not always been the case; seven, ten, eleven, and sixteen bit `bytes'
+ have been used.
+For purposes of the bitpacking convention, a byte implies the smallest native
+ integer storage representation offered by a platform.
+On modern platforms, this is generally assumed to be eight bits, not
+ necessarily because of the processor, but because of the file system/memory
+ architecture.
+Modern file systems invariably offer bytes as the fundamental atom of storage.
+
+The most ubiquitous architectures today consider a `byte' to be an octet.
+Note, however, that the Theora bitpacking convention is still well defined for
+ any native byte size;
+Theora uses the native bit-width of a given storage system.
+This document assumes that a byte is one octet for purposes of example only.
+
+\subsubsection{Words and Byte Order}
+
+A `word' is an integer size that is a grouped multiple of the byte size.
+Most architectures consider a word to be a group of two, four, or eight bytes.
+Each byte in the word can be ranked by order of `significance', e.g. the
+ significance of the bits in each byte when storing a binary integer in the
+ word.
+Several byte orderings are possible in a word.
+The common ones are
+\begin{itemize}
+\item{Big-endian:}
+in which the most significant byte comes first, e.g. 3-2-1-0,
+\item{Little-endian:}
+in which the least significant byte comes first, e.g. 0-1-2-3, and less
+ commonly
+\item{Mixed-endian:}
+e.g. 3-1-2-0 or 0-2-1-3.
+\end{itemize}
+
+The Theora bitpacking convention specifies storage and bitstream manipulation
+ at the byte, not word, level.
+Thus host word ordering is of a concern only during optimization, when writing
+ code that operates on a word of storage at a time rather than a byte.
+Logically, bytes are always encoded and decoded in order from byte zero through
+ byte $n$.
+
+\subsubsection{Bit Order}
+
+A byte has a well-defined `least significant' bit (LSb), which is the only bit
+ set when the byte is storing the two's complement integer value $+1$.
+A byte's `most significant' bit (MSb) is at the opposite end of the byte.
+Bits in a byte are numbered from zero at the LSb to $n$ for the MSb, where
+ $n=7$ in an octet.
+
+\subsection{Coding Bits into Bytes}
+
+The Theora codec needs to encode arbitrary bit-width integers from zero to 32
+ bits wide into packets.
+These integer fields are not aligned to the boundaries of the byte
+ representation; the next field is read at the bit position immediately
+ after the end of the previous field.
+
+The decoder logically unpacks integers by first reading the MSb of a binary
+ integer from the logical bitstream, followed by the next least significant
+ bit, etc., until the requested number of bits have been read.
+When unpacking the bytes into bits, the decoder begins by reading the MSb of
+ the integer to be read from the most significant unread bit position of the
+ source byte, followed by the next-most significant bit position of the
+ destination integer, and so on up to the requested number of bits.
+Note that this is a change from the earlier Xiph.org codec, Vorbis I, which
+ begins encoding from the LSb of the source integer, reading it from the LSb of
+ the source byte.
+When all the bits of the current source byte are read, decoding continues with
+ the MSb of the next byte.
+Any unfilled bits in the last byte of the packet must be cleared to zero by the
+ encoder.
+
+\subsubsection{Signedness}
+
+The binary integers decoded by the above process may be either signed or
+ unsigned.
+This may change from integer to integer, and this document will specify how
+ each should be interpreted as it is read.
+That is, depending on context, the three bit binary pattern `b111' can be taken
+ to represent either `$7$' as an unsigned integer or `$-1$' as a signed, two's
+ complement integer.
+
+\subsubsection{Encoding Example}
+
+The following example shows the state of an (8-bit) byte stream after several
+ binary integers are encoded, including the location of the put pointer for the
+ next bit to write to and the total length of the stream in bytes.
+
+Encode the 4 bit unsigned integer value `12' (b1100) into an empty byte stream.
+
+\begin{tabular}{rccccccccl}
+ & & & & &$\downarrow$&&&& \\
+ \hfill\vline& 7 & 6 & 5 & 4 & 3 & 2 & 1 & 0 & \\\cline{1-9}
+byte 0 \hfill\vline& 1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 &$\leftarrow$\\
+byte 1 \hfill\vline& 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \\
+byte 2 \hfill\vline& 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \\
+byte 3 \hfill\vline& 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \\
+\hfill$\vdots$\hfill\vline&\multicolumn{8}{c}{$\vdots$}& \\
+byte $n$\hfill\vline& 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 &
+byte stream length: 1 byte
+\end{tabular}
+\vspace{\baselineskip}
+
+Continue by encoding the 3 bit signed integer value `-1' (b111).
+
+\begin{tabular}{rccccccccl}
+ & & & & & & & &$\downarrow$& \\
+ \hfill\vline& 7 & 6 & 5 & 4 & 3 & 2 & 1 & 0 & \\\cline{1-9}
+byte 0 \hfill\vline& 1 & 1 & 0 & 0 & 1 & 1 & 1 & 0 &$\leftarrow$\\
+byte 1 \hfill\vline& 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \\
+byte 2 \hfill\vline& 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \\
+byte 3 \hfill\vline& 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \\
+\hfill$\vdots$\hfill\vline&\multicolumn{8}{c}{$\vdots$}& \\
+byte $n$\hfill\vline& 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 &
+byte stream length: 1 byte
+\end{tabular}
+\vspace{\baselineskip}
+
+Continue by encoding the 7 bit integer value `17' (b0010001).
+
+\begin{tabular}{rccccccccl}
+ & & & & & & &$\downarrow$&& \\
+ \hfill\vline& 7 & 6 & 5 & 4 & 3 & 2 & 1 & 0 & \\\cline{1-9}
+byte 0 \hfill\vline& 1 & 1 & 0 & 0 & 1 & 1 & 1 & 0 & \\
+byte 1 \hfill\vline& 0 & 1 & 0 & 0 & 0 & 1 & 0 & 0 &$\leftarrow$\\
+byte 2 \hfill\vline& 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \\
+byte 3 \hfill\vline& 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \\
+\hfill$\vdots$\hfill\vline&\multicolumn{8}{c}{$\vdots$}& \\
+byte $n$\hfill\vline& 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 &
+byte stream length: 2 bytes
+\end{tabular}
+\vspace{\baselineskip}
+
+Continue by encoding the 13 bit integer value `6969' (b11011 00111001).
+
+\begin{tabular}{rccccccccl}
+ & & & &$\downarrow$&&&& & \\
+ \hfill\vline& 7 & 6 & 5 & 4 & 3 & 2 & 1 & 0 & \\\cline{1-9}
+byte 0 \hfill\vline& 1 & 1 & 0 & 0 & 1 & 1 & 1 & 0 & \\
+byte 1 \hfill\vline& 0 & 1 & 0 & 0 & 0 & 1 & 1 & 1 & \\
+byte 2 \hfill\vline& 0 & 1 & 1 & 0 & 0 & 1 & 1 & 1 & \\
+byte 3 \hfill\vline& 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 &$\leftarrow$\\
+\hfill$\vdots$\hfill\vline&\multicolumn{8}{c}{$\vdots$}& \\
+byte $n$\hfill\vline& 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 &
+byte stream length: 4 bytes
+\end{tabular}
+\vspace{\baselineskip}
+
+\subsubsection{Decoding Example}
+
+The following example shows the state of the (8-bit) byte stream encoded in the
+ previous example after several binary integers are decoded, including the
+ location of the get pointer for the next bit to read.
+
+Read a two bit unsigned integer from the example encoded above.
+
+\begin{tabular}{rccccccccl}
+ & & &$\downarrow$&&&& & & \\
+ \hfill\vline& 7 & 6 & 5 & 4 & 3 & 2 & 1 & 0 & \\\cline{1-9}
+byte 0 \hfill\vline& 1 & 1 & 0 & 0 & 1 & 1 & 1 & 0 &$\leftarrow$\\
+byte 1 \hfill\vline& 0 & 1 & 0 & 0 & 0 & 1 & 1 & 1 & \\
+byte 2 \hfill\vline& 0 & 1 & 1 & 0 & 0 & 1 & 1 & 1 & \\
+byte 3 \hfill\vline& 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 &
+byte stream length: 4 bytes
+\end{tabular}
+\vspace{\baselineskip}
+
+Value read: 3 (b11).
+
+Read another two bit unsigned integer from the example encoded above.
+
+\begin{tabular}{rccccccccl}
+ & & & & &$\downarrow$&&& & \\
+ \hfill\vline& 7 & 6 & 5 & 4 & 3 & 2 & 1 & 0 & \\\cline{1-9}
+byte 0 \hfill\vline& 1 & 1 & 0 & 0 & 1 & 1 & 1 & 0 &$\leftarrow$\\
+byte 1 \hfill\vline& 0 & 1 & 0 & 0 & 0 & 1 & 1 & 1 & \\
+byte 2 \hfill\vline& 0 & 1 & 1 & 0 & 0 & 1 & 1 & 1 & \\
+byte 3 \hfill\vline& 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 &
+byte stream length: 4 bytes
+\end{tabular}
+\vspace{\baselineskip}
+
+Value read: 0 (b00).
+
+Two things are worth noting here.
+\begin{itemize}
+\item
+Although these four bits were originally written as a single four-bit integer,
+ reading some other combination of bit-widths from the bitstream is well
+ defined.
+There are no artificial alignment boundaries maintained in the bitstream.
+\item
+The first value is the integer `$3$' only because the context stated we were
+ reading an unsigned integer.
+Had the context stated we were reading a signed integer, the returned value
+ would have been the integer `$-1$'.
+\end{itemize}
+
+\subsubsection{End-of-Packet Alignment}
+
+The typical use of bitpacking is to produce many independent byte-aligned
+ packets which are embedded into a larger byte-aligned container structure,
+ such as an Ogg transport bitstream.
+Externally, each bitstream encoded as a byte stream must begin and end on a
+ byte boundary.
+Often, the encoded bitstream is not an integer number of bytes, and so there is
+ unused space in the last byte of a packet.
+
+Unused space in the last byte of a packet is always zeroed during the encoding
+ process.
+Thus, should this unused space be read, it will return binary zeroes.
+There is no marker pattern or stuffing bits that will allow the decoder to
+ obtain the exact size, in bits, of the original bitstream.
+This knowledge is not required for decoding.
+
+Attempting to read past the end of an encoded packet results in an
+ `end-of-packet' condition.
+Any further read operations after an `end-of-packet' condition shall also
+ return `end-of-packet'.
+Unlike Vorbis, Theora does not use truncated packets as a normal mode of
+ operation.
+Therefore if a decoder encounters the `end-of-packet' condition during normal
+ decoding, it may attempt to use the bits that were read to recover as much of
+ encoded data as possible, signal a warning or error, or both.
+
+\subsubsection{Reading Zero Bit Integers}
+
+Reading a zero bit integer returns the value `$0$' and does not increment
+ the stream pointer.
+Reading to the end of the packet, but not past the end, so that an
+ `end-of-packet' condition is not triggered, and then reading a zero bit
+ integer shall succeed, returning `$0$', and not trigger an `end-of-packet'
+ condition.
+Reading a zero bit integer after a previous read sets the `end-of-packet'
+ condition shall fail, also returning `end-of-packet'.
+
+\bibliography{spec}
+
+\end{document}
--- >8 ----
List archives: http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'cvs-request at xiph.org'
containing only the word 'unsubscribe' in the body. No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.
More information about the commits
mailing list