[xiph-cvs] r6483 - in theora/trunk/doc: . spec
tterribe at xiph.org
tterribe at xiph.org
Wed Mar 31 22:17:39 PST 2004
Author: tterribe
Date: 2004-04-01 01:17:38 -0500 (Thu, 01 Apr 2004)
New Revision: 6483
Added:
theora/trunk/doc/spec/
theora/trunk/doc/spec/hilbert-block.fig
theora/trunk/doc/spec/hilbert-block.tex
theora/trunk/doc/spec/hilbert-mb.fig
theora/trunk/doc/spec/hilbert-mb.tex
theora/trunk/doc/spec/spec.bib
theora/trunk/doc/spec/spec.tex
Removed:
theora/trunk/doc/spec.bib
theora/trunk/doc/spec.tex
Log:
Another chunk of text added.
A fair number of revisions were also done, as well as some reorganization.
Some clean-up of the table presentation, and a couple initial figures were
added.
<p>Added: theora/trunk/doc/spec/hilbert-block.fig
===================================================================
--- theora/trunk/doc/spec/hilbert-block.fig 2004-04-01 00:51:39 UTC (rev 6482)
+++ theora/trunk/doc/spec/hilbert-block.fig 2004-04-01 06:17:38 UTC (rev 6483)
@@ -0,0 +1,72 @@
+#FIG 3.2
+Landscape
+Center
+Metric
+A4
+100.00
+Single
+-2
+1200 2
+6 675 645 3825 3795
+4 1 0 50 0 1 12 0.0000 0 135 90 900 3660 0\001
+4 1 0 50 0 1 12 0.0000 0 135 90 1800 3660 1\001
+4 1 0 50 0 1 12 0.0000 0 135 90 1800 2760 2\001
+4 1 0 50 0 1 12 0.0000 0 135 90 900 2760 3\001
+4 1 0 50 0 1 12 0.0000 0 135 90 900 1860 4\001
+4 1 0 50 0 1 12 0.0000 0 135 90 900 960 5\001
+4 1 0 50 0 1 12 0.0000 0 135 90 1800 960 6\001
+4 1 0 50 0 1 12 0.0000 0 135 90 1800 1860 7\001
+4 1 0 50 0 1 12 0.0000 0 135 90 2700 1860 8\001
+4 1 0 50 0 1 12 0.0000 0 135 90 2700 960 9\001
+4 1 0 50 0 1 12 0.0000 0 135 180 3600 960 10\001
+4 1 0 50 0 1 12 0.0000 0 135 180 3600 1860 11\001
+4 1 0 50 0 1 12 0.0000 0 135 180 3600 2760 12\001
+4 1 0 50 0 1 12 0.0000 0 135 180 2700 2760 13\001
+4 1 0 50 0 1 12 0.0000 0 135 180 2700 3660 14\001
+4 1 0 50 0 1 12 0.0000 0 135 180 3600 3660 15\001
+-6
+2 1 0 1 0 7 50 0 -1 0.000 0 0 -1 1 0 2
+ 1 1 1.00 60.00 120.00
+ 1125 3600 1575 3600
+2 1 0 1 0 7 50 0 -1 0.000 0 0 -1 1 0 2
+ 1 1 1.00 60.00 120.00
+ 1800 3375 1800 2925
+2 1 0 1 0 7 50 0 -1 0.000 0 0 -1 1 0 2
+ 1 1 1.00 60.00 120.00
+ 1575 2700 1125 2700
+2 1 0 1 0 7 50 0 -1 0.000 0 0 -1 1 0 2
+ 1 1 1.00 60.00 120.00
+ 900 2475 900 2025
+2 1 0 1 0 7 50 0 -1 0.000 0 0 -1 1 0 2
+ 1 1 1.00 60.00 120.00
+ 900 1575 900 1125
+2 1 0 1 0 7 50 0 -1 0.000 0 0 -1 1 0 2
+ 1 1 1.00 60.00 120.00
+ 1125 900 1575 900
+2 1 0 1 0 7 50 0 -1 0.000 0 0 -1 1 0 2
+ 1 1 1.00 60.00 120.00
+ 1800 1125 1800 1575
+2 1 0 1 0 7 50 0 -1 0.000 0 0 -1 1 0 2
+ 1 1 1.00 60.00 120.00
+ 2025 1800 2475 1800
+2 1 0 1 0 7 50 0 -1 0.000 0 0 -1 1 0 2
+ 1 1 1.00 60.00 120.00
+ 2700 1575 2700 1125
+2 1 0 1 0 7 50 0 -1 0.000 0 0 -1 1 0 2
+ 1 1 1.00 60.00 120.00
+ 2925 900 3375 900
+2 1 0 1 0 7 50 0 -1 0.000 0 0 -1 1 0 2
+ 1 1 1.00 60.00 120.00
+ 3600 1125 3600 1575
+2 1 0 1 0 7 50 0 -1 0.000 0 0 -1 1 0 2
+ 1 1 1.00 60.00 120.00
+ 3600 2025 3600 2475
+2 1 0 1 0 7 50 0 -1 0.000 0 0 -1 1 0 2
+ 1 1 1.00 60.00 120.00
+ 3375 2700 2925 2700
+2 1 0 1 0 7 50 0 -1 0.000 0 0 -1 1 0 2
+ 1 1 1.00 60.00 120.00
+ 2700 2925 2700 3375
+2 1 0 1 0 7 50 0 -1 0.000 0 0 -1 1 0 2
+ 1 1 1.00 60.00 120.00
+ 2925 3600 3375 3600
Added: theora/trunk/doc/spec/hilbert-block.tex
===================================================================
--- theora/trunk/doc/spec/hilbert-block.tex 2004-04-01 00:51:39 UTC (rev 6482)
+++ theora/trunk/doc/spec/hilbert-block.tex 2004-04-01 06:17:38 UTC (rev 6483)
@@ -0,0 +1,88 @@
+\setlength{\unitlength}{4144sp}%
+%
+\begingroup\makeatletter\ifx\SetFigFont\undefined
+% extract first six characters in \fmtname
+\def\x#1#2#3#4#5#6#7\relax{\def\x{#1#2#3#4#5#6}}%
+\expandafter\x\fmtname xxxxxx\relax \def\y{splain}%
+\ifx\x\y % LaTeX or SliTeX?
+\gdef\SetFigFont#1#2#3{%
+ \ifnum #1<17\tiny\else \ifnum #1<20\small\else
+ \ifnum #1<24\normalsize\else \ifnum #1<29\large\else
+ \ifnum #1<34\Large\else \ifnum #1<41\LARGE\else
+ \huge\fi\fi\fi\fi\fi\fi
+ \csname #3\endcsname}%
+\else
+\gdef\SetFigFont#1#2#3{\begingroup
+ \count@#1\relax \ifnum 25<\count@\count at 25\fi
+ \def\x{\endgroup\@setsize\SetFigFont{#2pt}}%
+ \expandafter\x
+ \csname \romannumeral\the\count@ pt\expandafter\endcsname
+ \csname @\romannumeral\the\count@ pt\endcsname
+ \csname #3\endcsname}%
+\fi
+\fi\endgroup
+\begin{picture}(2784,2835)(859,-2821)
+\put(901,-2821){\makebox(0,0)[b]{\smash{\SetFigFont{12}{14.4}{rm}{\color[rgb]{0,0,0}0}%
+}}}
+\put(1801,-2821){\makebox(0,0)[b]{\smash{\SetFigFont{12}{14.4}{rm}{\color[rgb]{0,0,0}1}%
+}}}
+\put(1801,-1921){\makebox(0,0)[b]{\smash{\SetFigFont{12}{14.4}{rm}{\color[rgb]{0,0,0}2}%
+}}}
+\put(901,-1921){\makebox(0,0)[b]{\smash{\SetFigFont{12}{14.4}{rm}{\color[rgb]{0,0,0}3}%
+}}}
+\put(901,-1021){\makebox(0,0)[b]{\smash{\SetFigFont{12}{14.4}{rm}{\color[rgb]{0,0,0}4}%
+}}}
+\put(901,-121){\makebox(0,0)[b]{\smash{\SetFigFont{12}{14.4}{rm}{\color[rgb]{0,0,0}5}%
+}}}
+\put(1801,-121){\makebox(0,0)[b]{\smash{\SetFigFont{12}{14.4}{rm}{\color[rgb]{0,0,0}6}%
+}}}
+\put(1801,-1021){\makebox(0,0)[b]{\smash{\SetFigFont{12}{14.4}{rm}{\color[rgb]{0,0,0}7}%
+}}}
+\put(2701,-1021){\makebox(0,0)[b]{\smash{\SetFigFont{12}{14.4}{rm}{\color[rgb]{0,0,0}8}%
+}}}
+\put(2701,-121){\makebox(0,0)[b]{\smash{\SetFigFont{12}{14.4}{rm}{\color[rgb]{0,0,0}9}%
+}}}
+\put(3601,-121){\makebox(0,0)[b]{\smash{\SetFigFont{12}{14.4}{rm}{\color[rgb]{0,0,0}10}%
+}}}
+\put(3601,-1021){\makebox(0,0)[b]{\smash{\SetFigFont{12}{14.4}{rm}{\color[rgb]{0,0,0}11}%
+}}}
+\put(3601,-1921){\makebox(0,0)[b]{\smash{\SetFigFont{12}{14.4}{rm}{\color[rgb]{0,0,0}12}%
+}}}
+\put(2701,-1921){\makebox(0,0)[b]{\smash{\SetFigFont{12}{14.4}{rm}{\color[rgb]{0,0,0}13}%
+}}}
+\put(2701,-2821){\makebox(0,0)[b]{\smash{\SetFigFont{12}{14.4}{rm}{\color[rgb]{0,0,0}14}%
+}}}
+\put(3601,-2821){\makebox(0,0)[b]{\smash{\SetFigFont{12}{14.4}{rm}{\color[rgb]{0,0,0}15}%
+}}}
+\thinlines
+{\color[rgb]{0,0,0}\put(1126,-2761){\vector( 1, 0){450}}
+}%
+{\color[rgb]{0,0,0}\put(1801,-2536){\vector( 0, 1){450}}
+}%
+{\color[rgb]{0,0,0}\put(1576,-1861){\vector(-1, 0){450}}
+}%
+{\color[rgb]{0,0,0}\put(901,-1636){\vector( 0, 1){450}}
+}%
+{\color[rgb]{0,0,0}\put(901,-736){\vector( 0, 1){450}}
+}%
+{\color[rgb]{0,0,0}\put(1126,-61){\vector( 1, 0){450}}
+}%
+{\color[rgb]{0,0,0}\put(1801,-286){\vector( 0,-1){450}}
+}%
+{\color[rgb]{0,0,0}\put(2026,-961){\vector( 1, 0){450}}
+}%
+{\color[rgb]{0,0,0}\put(2701,-736){\vector( 0, 1){450}}
+}%
+{\color[rgb]{0,0,0}\put(2926,-61){\vector( 1, 0){450}}
+}%
+{\color[rgb]{0,0,0}\put(3601,-286){\vector( 0,-1){450}}
+}%
+{\color[rgb]{0,0,0}\put(3601,-1186){\vector( 0,-1){450}}
+}%
+{\color[rgb]{0,0,0}\put(3376,-1861){\vector(-1, 0){450}}
+}%
+{\color[rgb]{0,0,0}\put(2701,-2086){\vector( 0,-1){450}}
+}%
+{\color[rgb]{0,0,0}\put(2926,-2761){\vector( 1, 0){450}}
+}%
+\end{picture}
Added: theora/trunk/doc/spec/hilbert-mb.fig
===================================================================
--- theora/trunk/doc/spec/hilbert-mb.fig 2004-04-01 00:51:39 UTC (rev 6482)
+++ theora/trunk/doc/spec/hilbert-mb.fig 2004-04-01 06:17:38 UTC (rev 6483)
@@ -0,0 +1,24 @@
+#FIG 3.2
+Landscape
+Center
+Metric
+A4
+100.00
+Single
+-2
+1200 2
+6 0 60 2700 1860
+4 1 0 50 0 1 12 0.0000 0 135 90 900 1860 0\001
+4 1 0 50 0 1 12 0.0000 0 135 90 900 960 1\001
+4 1 0 50 0 1 12 0.0000 0 135 90 1800 960 2\001
+4 1 0 50 0 1 12 0.0000 0 135 90 1800 1860 3\001
+-6
+2 1 0 1 0 7 50 0 -1 0.000 0 0 -1 1 0 2
+ 1 1 1.00 60.00 120.00
+ 900 1575 900 1125
+2 1 0 1 0 7 50 0 -1 0.000 0 0 -1 1 0 2
+ 1 1 1.00 60.00 120.00
+ 1125 900 1575 900
+2 1 0 1 0 7 50 0 -1 0.000 0 0 -1 1 0 2
+ 1 1 1.00 60.00 120.00
+ 1800 1125 1800 1575
Added: theora/trunk/doc/spec/hilbert-mb.tex
===================================================================
--- theora/trunk/doc/spec/hilbert-mb.tex 2004-04-01 00:51:39 UTC (rev 6482)
+++ theora/trunk/doc/spec/hilbert-mb.tex 2004-04-01 06:17:38 UTC (rev 6483)
@@ -0,0 +1,40 @@
+\setlength{\unitlength}{4144sp}%
+%
+\begingroup\makeatletter\ifx\SetFigFont\undefined
+% extract first six characters in \fmtname
+\def\x#1#2#3#4#5#6#7\relax{\def\x{#1#2#3#4#5#6}}%
+\expandafter\x\fmtname xxxxxx\relax \def\y{splain}%
+\ifx\x\y % LaTeX or SliTeX?
+\gdef\SetFigFont#1#2#3{%
+ \ifnum #1<17\tiny\else \ifnum #1<20\small\else
+ \ifnum #1<24\normalsize\else \ifnum #1<29\large\else
+ \ifnum #1<34\Large\else \ifnum #1<41\LARGE\else
+ \huge\fi\fi\fi\fi\fi\fi
+ \csname #3\endcsname}%
+\else
+\gdef\SetFigFont#1#2#3{\begingroup
+ \count@#1\relax \ifnum 25<\count@\count at 25\fi
+ \def\x{\endgroup\@setsize\SetFigFont{#2pt}}%
+ \expandafter\x
+ \csname \romannumeral\the\count@ pt\expandafter\endcsname
+ \csname @\romannumeral\the\count@ pt\endcsname
+ \csname #3\endcsname}%
+\fi
+\fi\endgroup
+\begin{picture}(984,1035)(859,-1021)
+\put(901,-1021){\makebox(0,0)[b]{\smash{\SetFigFont{12}{14.4}{rm}{\color[rgb]{0,0,0}0}%
+}}}
+\put(901,-121){\makebox(0,0)[b]{\smash{\SetFigFont{12}{14.4}{rm}{\color[rgb]{0,0,0}1}%
+}}}
+\put(1801,-121){\makebox(0,0)[b]{\smash{\SetFigFont{12}{14.4}{rm}{\color[rgb]{0,0,0}2}%
+}}}
+\put(1801,-1021){\makebox(0,0)[b]{\smash{\SetFigFont{12}{14.4}{rm}{\color[rgb]{0,0,0}3}%
+}}}
+\thinlines
+{\color[rgb]{0,0,0}\put(901,-736){\vector( 0, 1){450}}
+}%
+{\color[rgb]{0,0,0}\put(1126,-61){\vector( 1, 0){450}}
+}%
+{\color[rgb]{0,0,0}\put(1801,-286){\vector( 0,-1){450}}
+}%
+\end{picture}
Copied: theora/trunk/doc/spec/spec.bib (from rev 6482, theora/trunk/doc/spec.bib)
===================================================================
--- theora/trunk/doc/spec.bib 2004-04-01 00:51:39 UTC (rev 6482)
+++ theora/trunk/doc/spec/spec.bib 2004-04-01 06:17:38 UTC (rev 6483)
@@ -0,0 +1,107 @@
+ at MISC{Mel04,
+ author="Mike Melanson",
+ title="{VP3} Bitstream Format and Decoding Process",
+ howpublished="\url{http://home.pcisys.net/~melanson/codecs/vp3-format.txt}",
+ month="Mar.",
+ year=2004
+}
+
+ at MISC{Poyn97,
+ author="Charles Poynton",
+ title="Frequently-Asked Questions about Gamma",
+ howpublished="\url{http://www.poynton.com/GammaFAQ.html}",
+ month="Feb.",
+ year=1997
+}
+
+ at MANUAL{rec470,
+ key="ITU470",
+ title="Reccomendation {ITU-R} {BT}.470-6: Conventional Television Systems",
+ edition="1970, revised",
+ organization="International Telecommunications Union",
+ address="1211 Geneva 20, Switzerland",
+ year=1998
+}
+
+ at MANUAL{rec601,
+ key="ITU601",
+ title="Reccomendation {ITU-R} {BT}.601-5: Studio Encoding Parameters of
+ Digital Television for Standard 4:3 and Wide-Screen 16:9 Aspect Ratios",
+ edition="1982, revised",
+ organization="International Telecommunications Union",
+ address="1211 Geneva 20, Switzerland",
+ year=1995
+}
+
+ at MANUAL{rec709,
+ key="ITU709",
+ title="Recommendation {ITU-R} {BT}.709-5: Parameter values for the {HDTV}
+ standards for production and international programme exchange",
+ edition="1990, revised",
+ organization="International Telecommunications Union",
+ address="1211 Geneva 20, Switzerland",
+ year=2002
+}
+
+ at MANUAL{smpte170m,
+ key="SMPTE170M",
+ title="{SMPTE-170M}: Television --- Composite Analog Video Signal --- {NTSC}
+ for Studio Applications",
+ organization="Society of Motion Pciture and Television Engineers",
+ year=1994
+}
+
+ at MANUAL{smpte240m,
+ key="SMPTE240M",
+ title="{SMPTE-240M}: Television --- Signal Parameters --- 1125-Line
+ High-Definition Production",
+ organization="Society of Motion Pciture and Television Engineers",
+ year=1999
+}
+
+ at MISC{oggstream,
+ author="Christopher Montgomery",
+ title="{Ogg} logical and physical bitstream overview",
+ howpublished="\url{http://www.xiph.org/ogg/doc/oggstream.html}",
+ month="Jul.",
+ year=2002
+}
+
+ at MISC{oggframe,
+ author="Christopher Montgomery",
+ title="{Ogg} logical bitstream framing",
+ howpublished="\url{http://www.xiph.org/ogg/doc/framing.html}",
+ month="Jul.",
+ year=2002
+}
+
+ at MANUAL{vorbis,
+ title="{Vorbis~I} specification",
+ organization="{Xiph.org Foundation}",
+ year=2002,
+ note="\url{http://www.xiph.org/ogg/vorbis/doc/}"
+}
+
+ at MANUAL{rfc3533,
+ author="Silvia Pfeiffer",
+ title="{RFC} 3533: The {Ogg} Encapsulation Format Version 0",
+ month="May",
+ year=2003,
+ note="\url{http://www.ietf.org/rfc/rfc3533.txt}"
+}
+
+ at MANUAL{rfc3534,
+ author="Linus Walleij",
+ title="The {application/ogg} Media Type",
+ month="May",
+ year=2003,
+ note="\url{http://www.ietf.org/rfc/rfc3534.txt}"
+}
+
+ at MANUAL{rfc3550,
+ author="H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson",
+ title="RTP: A Transport Protocol for Real-Time Applications",
+ month="Jul.",
+ year=2003,
+ note="\url{http://www.ietf.org/rfc/rfc3550.txt}"
+}
Copied: theora/trunk/doc/spec/spec.tex (from rev 6482, theora/trunk/doc/spec.tex)
===================================================================
--- theora/trunk/doc/spec.tex 2004-04-01 00:51:39 UTC (rev 6482)
+++ theora/trunk/doc/spec/spec.tex 2004-04-01 06:17:38 UTC (rev 6483)
@@ -0,0 +1,1808 @@
+\documentclass[11pt,letterpaper]{article}
+
+\usepackage{latexsym}
+\usepackage{amssymb}
+\usepackage{amsmath}
+\usepackage{graphicx}
+\usepackage{booktabs}
+\usepackage[pdfpagemode=None,pdfstartview=FitH,pdfview=FitH,colorlinks=true]%
+ {hyperref}
+
+\newtheorem{theorem}{Theorem}[section]
+\newcommand{\qi}{\ensuremath{\mathit{qi}} }
+\newcommand{\ti}{\ensuremath{\mathit{ti}} }
+\newcommand{\bitvar}[1]{\ensuremath{\left[\mathrm{#1}\right]}}
+\newcommand{\term}[1]{{\em #1}}
+
+\pagestyle{headings}
+\bibliographystyle{alpha}
+
+\title{Theora I Specification}
+\author{Xiph.org Foundation}
+\date{\today}
+
+\begin{document}
+
+\maketitle
+\tableofcontents
+\newpage
+
+\section{Introduction and Description}
+
+This section provides a high level description of the Theora codec's
+ construction.
+A bit-by-bit specification appears beginning in Section~\ref{sec:bitpacking}.
+The later sections assume a high-level understanding of the Theora decode
+ process, which is provided below.
+
+\subsection{Overview}
+
+Theora is a general purpose, lossy video codec.
+It is based on the VP3 video codec produced by On2 Technologies
+ (\url{http://www.on2.com/}).
+On2 Technologies donated the VP3.2 source code to the Xiph.org
+ Foundation and it was released under a BSD-like license.
+On2 also made an irrevocable, royalty-free license grant for any patent claims
+ it might have over the software and any derivatives.
+No formal specification exists for the VP3 format beyond this source code,
+ though Mike Melanson maintains a detailed description \cite{Mel04}.
+Portions of this specification were adopted from his text with permission.
+
+\subsubsection{VP3 and Theora}
+
+Theora contains a superset of the features that were available in the original
+ VP3 codec.
+Content encoded with VP3.2 can be losslessly transcoded into the Theora format.
+%TODO: what about VP3.1 etc? source tables all say 'VP31'
+Theora content cannot, in general, be losslessly transcoded into the VP3
+ format.
+If a feature is not available in the original VP3 format, this is mentioned
+ when that feature is defined.
+A complete list of these features appears in Appendix~REF.
+
+\subsubsection{Video Formats}
+
+Theora I currently supports progressive video data of arbitrary dimensions at a
+ constant frame rate in one of several $Y'C_bC_r$ color spaces.
+The precise definition the color spaces supported appears in
+ Section~\ref{sec:colorspaces}.
+Three different chroma subsampling formats are supported: 4:2:0, 4:2:2,
+ and 4:4:4.
+The precise details of each of these formats and their sampling locations are
+ described in Section~REF.
+
+The Theora I format does not support interlaced material, variable frame rates,
+ bit-depths larger than 8 bits per component, nor alternate color spaces such
+ as RGB or arbitrary multi-channel spaces.
+Black and white content can be efficiently encoded, however, because the
+ uniform chroma planes compress well.
+Support for interlaced material is planned for a future version.
+Support for infrequently changing frame rates can already be achieved by
+ chaining several Theora streams together.
+Support for increased bit depths or additional color spaces is not planned.
+
+\subsubsection{Classification}
+
+Theora I is a block-based lossy transform codec that utilizes an
+ $8\times 8$ Type-II Discrete Cosine Transform and block-based motion
+ compensation.
+This places it in the same class of codecs as MPEG-1, -2, -4, and H.263.
+The details of how individual blocks are organized and how DCT coefficients are
+ organized in the bitstream differ substantially from these codecs, however.
+Theora supports only intra frames (I frames in MPEG) and inter frames (P frames
+ in MPEG).
+There is no equivalent to the bi-predictive frames (B frames) found in MPEG
+ codecs.
+
+\subsubsection{Assumptions}
+
+The Theora codec design assumes a complex, psychovisually-aware encoder and a
+ simple, low-complexity decoder.
+%TODO: Talk more about implementation complexity.
+
+Theora provides none of its own framing, synchronization, or protection against
+ transmission errors; it is solely a method of accepting input video frames and
+ compressing these frames into raw, unformatted `packets'.
+The decoder then accepts these raw packets in sequence, decodes them, and
+ synthesizes a fascimile of the original video frames.
+Theora is a free-form variable bit rate (VBR) codec, and packets have no
+ minimum size, maximum size, or fixed/expected size.
+
+Theora packets are thus intended to be used with a transport mechanism that
+ provides free-form framing, synchronization, positioning, and error correction
+ in accordance with these design assumptions, such as Ogg (for file transport)
+ or RTP (for network multicast).
+For the purposes of a few examples in this document, we will assume that Theora
+ is embedded in an Ogg stream specifically, although this is by no means a
+ requirement or fundamental assumption in the Theora design.
+
+The specification for embedding Theora into an Ogg transport stream is given in
+ Appendix~\ref{app:oggencapsulation}.
+
+\subsubsection{Codec Setup and Probability Model}
+
+Theora's heritage is the proprietary commerical codec VP3, and it retains a
+ fair amount of inflexibility when compared to Vorbis \cite{vorbis}, the first
+ Xiph.org codec, which began as a research codec.
+However, to provide additional scope for encoder improvement, Theora adopts
+ some of the configurable aspects of decoder setup that are present in Vorbis.
+This configuration data is not available in VP3, which used hardcoded values
+ instead.
+
+Theora makes the same controversial design decision that Vorbis made to include
+ the entire probability model for the DCT coefficients and all the quantization
+ parameters in the bitstream headers.
+This is often several hundred fields.
+This makes it impossible to begin decoding at any frame in the stream without
+ having previously fetched the codec info and codec setup headers.
+
+\begin{verse}
+{\bf Note:} Theora {\em can} initiate decode at an arbitrary intra-frame packet
+ within a bitstream so long as the codec has been initialized with the setup
+ headers.
+\end{verse}
+
+Thus, Theora headers are both required for decode to begin and relatively large
+ as bitstream headers go.
+The header size is unbounded, although as a rule-of-thumb less than 16kB is
+ recommended, and Xiph.org's reference encoder follows this suggestion.
+%TODO: Is 8kB enough? My setup header is 7.4kB, that doesn't leave much room
+% for comments.
+%RG: the lesson from vorbis is that as small as possible is really
+% important in some applications. Practically, what's acceptable
+% depends a great deal on the target bitrate. I'd leave 16 kB in the
+% spec for now. fwiw more than 1k of comments is quite unusual.
+
+Our own design work indicates that the primary liability of the required header
+ is in mindshare; it is an unusual design and thus causes some amount of
+ complaint among engineers as this runs against current design trends and
+ points out limitations in some existing software/interface designs.
+However, we find that it does not fundamentally limit Theora's suitable
+ application space.
+
+\subsubsection{Format Specification}
+
+The Theora format is well-defined by its decode specification; any encoder that
+ produces packets that are correctly decoded by an implementation following
+ this specification may be considered a proper Theora encoder.
+A decoder must faithfully and completely implement the specification defined
+ herein %, except where noted,
+ to be considered a proper Theora decoder.
+Where appropriate, a non-normative description of encoder processes is
+ included.
+These sections will be marked as such, and a proper Theora encoder is not
+ bound to follow them.
+
+%TODO: \subsubsection{Hardware Profile}
+
+\subsection{Coded Video Structure}
+
+Theora is based on $8\times 8$ blocks of pixels.
+This sections describes how a video frame is laid out, divided into blocks, and
+ how those blocks are organized.
+
+\subsubsection{Frame Layout}
+
+A video frame in Theora is a two-dimensional array of pixels.
+Theora, like VP3, uses a right-handed coordinate system, with the origin in the
+ lower-left corner of the frame.
+This is contrary to many video formats which use a left-handed coordinate
+ system with the origin in the upper-left corner of the frame.
+%INT: This means that for interlaced material, the definition of `even fields'
+%INT: and `odd fields' may be reversed between Theora and other video codecs.
+%INT: This document will always refer to them as `top fields' and `bottom
+%INT: fields'.
+
+Theora divides the pixel array up into three separate \term{color planes}, one
+ for each of the $Y'$, $C_b$, and $C_r$ components of the pixel.
+The $Y'$ plane is also called the \term{luma plane}, and the $C_b$ and $C_r$
+ planes are also called the \term{chroma planes}.
+In some pixel formats, the chroma planes are decimated by two in one or both
+ directions.
+This means that the width or height of the chroma planes may be half that of
+ the total frame width and height, and thus only a multiple of eight, not
+ sixteen.
+The luma plane is never decimated.
+
+\subsubsection{Picture Region}
+
+A video frame in Theora is required to have a width and height that are
+ multiples of sixteen.
+However, inside a frame a smaller \term{picture region} may be defined.
+The picture region can be offset from the lower-left corner of the frame by up
+ to 255 pixels in each direction, and may have an arbitrary width and height,
+ provided that it is contained entirely within the coded frame.
+It is this picture region that contains the actual video data.
+The portions of the frame which lie outside the picture region may contain
+ arbitrary data, and should be cropped away after decode.
+The picture region plays no other role in the decode process, which operates on
+ the entire video frame.
+
+\subsubsection{Blocks and Super Blocks}
+
+Each color plane is subdivided into $8\times 8$ \term{blocks}.
+Blocks are grouped into $4\times 4$ arrays called \term{super blocks}.
+Each color plane has its own set of blocks and super blocks.
+The boundaries of the luma plane are not necessarily aligned with those of the
+ chroma planes, if the chroma planes have been decimated.
+
+Blocks are accessed in two different orders in the various decoder processes.
+The first is \term{raster order}.
+This indexes each block in row-major order, starting in the lower left and
+ proceeding along the bottom row, followed by the next row up starting on the
+ left, etc.
+The second is \term{coded order}.
+In coded order, blocks are accessed by super block.
+Each super block is traversed in raster order, similar to raster order for
+ blocks.
+Within each super block, however, blocks are accessed in a Hilbert curve
+ pattern, illustrated in Figure~\ref{fig:hilbert-block}.
+If a color plane does not contain a complete super block on the top or right
+ sides, the same ordering is still used, simply with any blocks outside the
+ frame boundary ommitted.
+
+\begin{figure}[htb]
+\begin{center}
+\input{hilbert-block}
+\end{center}
+\caption{Hilbert curve ordering of blocks within a super block}
+\label{fig:hilbert-block}
+\end{figure}
+
+To illustrate these two orderings, consider a frame that is 240 pixels wide and
+ 48 pixels high.
+Each row of the luma plane has 30 blocks and 8 super blocks, and there are 6
+ rows of blocks and one row of super blocks.
+
+When accessed in raster order, each block in the luma plane is assigned the
+ following indices:
+
+\vspace{\baselineskip}
+\begin{center}
+\begin{tabular}{|cccc|c|cc|}\hline
+150 & 151 & 152 & 153 & $\ldots$ & 178 & 179 \\
+120 & 121 & 122 & 123 & $\ldots$ & 148 & 149 \\\hline
+ 90 & 91 & 92 & 93 & $\ldots$ & 118 & 119 \\
+ 60 & 61 & 62 & 63 & $\ldots$ & 88 & 89 \\
+ 30 & 31 & 32 & 33 & $\ldots$ & 58 & 59 \\
+ 0 & 1 & 2 & 3 & $\ldots$ & 28 & 29 \\\hline
+\end{tabular}
+\end{center}
+\vspace{\baselineskip}
+
+When accessed in coded order, each block in the luma plane is assigned the
+ following indices:
+
+\vspace{\baselineskip}
+\begin{center}
+\begin{tabular}{|cccc|c|cc|}\hline
+123 & 122 & 125 & 124 & $\ldots$ & 179 & 178 \\
+120 & 121 & 126 & 127 & $\ldots$ & 176 & 177 \\\hline
+ 5 & 6 & 9 & 10 & $\ldots$ & 117 & 118 \\
+ 4 & 7 & 8 & 11 & $\ldots$ & 116 & 119 \\
+ 3 & 2 & 13 & 12 & $\ldots$ & 115 & 114 \\
+ 0 & 1 & 14 & 15 & $\ldots$ & 112 & 113 \\\hline
+\end{tabular}
+\end{center}
+\vspace{\baselineskip}
+
+Blocks in the chroma planes immediately follow those of the luma plane without
+ a break.
+
+\subsubsection{Macro Blocks}
+
+A macro block contains a $2\times 2$ array of blocks in the luma plane
+ {\em and} the co-located blocks in the chroma planes.
+Thus macro blocks can represent anywhere from six to twelve blocks, depending
+ on how the chroma planes are decimated.
+Macro blocks contain information about coding mode and motion vectors for the
+ corresponding blocks in all color planes.
+
+Macro blocks are also accessed in a \term{coded order}.
+This coded order proceeds be examining each super block in the luma plane in
+ raster order, and traversing the four macro blocks inside using a smaller
+ Hilbert curve, as shown in Figure~\ref{fig:hilbert-mb}.
+If the luma plane does not contain a complete super block on the top or right
+ sides, the same ordering is still used, simply with any macro blocks outside
+ the frame boundary omitted.
+Because the frame size is constrained to be a multiple of 16, there are never
+ any partial macro blocks.
+Unlike blocks, macro blocks need never be accessed in a pure raster order.
+
+\begin{figure}[htb]
+\begin{center}
+\input{hilbert-mb}
+\end{center}
+\caption{Hilbert curve ordering of macro blocks within a super block}
+\label{fig:hilbert-mb}
+\end{figure}
+
+Using the same frame size as the example above, there are 15 macro blocks in
+ each row and 3 rows of macro blocks.
+They are assigned the following indices:
+
+\vspace{\baselineskip}
+\begin{center}
+\begin{tabular}{|cc|cc|c|cc|c|}\hline
+30 & 31 & 32 & 33 & $\cdots$ & 42 & 43 & 44 \\\hline
+ 1 & 2 & 5 & 6 & $\cdots$ & 25 & 26 & 29 \\
+ 0 & 3 & 4 & 7 & $\cdots$ & 24 & 27 & 28 \\\hline
+\end{tabular}
+\end{center}
+\vspace{\baselineskip}
+
+\subsubsection{Predictors}
+
+Each block is coded using one of a small, fixed set of \term{coding modes} that
+ define the \term{predictor} for that block's contents.
+The INTRA mode uses a constant predictor and is the only mode allowed in intra
+ frames.
+The other coding modes use the contents of one of two different \term{reference
+ frames}.
+A reference frame is the fully decoded version of a previous frame in the
+ stream.
+The first available reference frame is the previous frame, whether it was an
+ intra frame or an inter frame.
+The second available reference frame is the previous intra frame, called the
+ \term{golden frame}.
+The most important inter coding mode is INTER\_NOMV, which uses the co-located
+ contents of the block in the previous frame as the predictor with no
+ motion-compensation, e.g., a motion vector of $(0,0)$.
+
+\subsubsection{DCT Coefficients}
+
+To each block's predictor, a \term{residual} is added to form the final
+ contents of the block.
+The residual is stored by first applying an integer approximation of a
+ two-dimensional Type II Discrete Cosine Transform and then quantizing the
+ resulting coefficients.
+The DCT takes an an $8\times 8$ array of pixel values as input and returns an
+ $8\times 8$ array of coefficient values.
+The \term{natural ordering} of these coefficients is defined to be row-major
+ order.
+They are also often indexed in \term{zig-zag order}, as shown in
+ Table~\ref{tab:zig-zag}.
+
+\begin{table}[htb]
+\begin{center}
+\begin{tabular}[c]{r|c@{}c@{}c@{}c@{}c@{}c@{}c@{}c@{}c@{}c@{}c@{}c@{}c@{}c@{}c}
+\multicolumn{1}{r}{} &0&&1&&2&&3&&4&&5&&6&&7 \\\cline{2-16}
+0 & 0 &$\rightarrow$& 1 && 5 &$\rightarrow$& 6 && 14 &$\rightarrow$& 15 && 27 &$\rightarrow$& 28 \\[-0.5\defaultaddspace]
+ & &$\swarrow$&&$\nearrow$& &$\swarrow$&&$\nearrow$& &$\swarrow$&&$\nearrow$& &$\swarrow$& \\
+1 & 2 & & 4 && 7 & & 13 && 16 & & 26 && 29 & & 42 \\[-0.5\defaultaddspace]
+ &$\downarrow$&$\nearrow$&&$\swarrow$&&$\nearrow$&&$\swarrow$&&$\nearrow$&&$\swarrow$&&$\nearrow$&$\downarrow$ \\
+2 & 3 & & 8 && 12 & & 17 && 25 & & 30 && 41 & & 43 \\[-0.5\defaultaddspace]
+ & &$\swarrow$&&$\nearrow$& &$\swarrow$&&$\nearrow$& &$\swarrow$&&$\nearrow$& &$\swarrow$& \\
+3 & 9 & & 11 && 18 & & 24 && 31 & & 40 && 44 & & 53 \\[-0.5\defaultaddspace]
+ &$\downarrow$&$\nearrow$&&$\swarrow$&&$\nearrow$&&$\swarrow$&&$\nearrow$&&$\swarrow$&&$\nearrow$&$\downarrow$ \\
+4 & 10 & & 19 && 23 & & 32 && 39 & & 45 && 52 & & 54 \\[-0.5\defaultaddspace]
+ & &$\swarrow$&&$\nearrow$& &$\swarrow$&&$\nearrow$& &$\swarrow$&&$\nearrow$& &$\swarrow$& \\
+5 & 20 & & 22 && 33 & & 38 && 46 & & 51 && 55 & & 60 \\[-0.5\defaultaddspace]
+ &$\downarrow$&$\nearrow$&&$\swarrow$&&$\nearrow$&&$\swarrow$&&$\nearrow$&&$\swarrow$&&$\nearrow$&$\downarrow$ \\
+6 & 21 & & 34 && 37 & & 47 && 50 & & 56 && 59 & & 61 \\[-0.5\defaultaddspace]
+ & &$\swarrow$&&$\nearrow$& &$\swarrow$&&$\nearrow$& &$\swarrow$&&$\nearrow$& &$\swarrow$& \\
+7 & 35 &$\rightarrow$& 36 && 48 &$\rightarrow$& 49 && 57 &$\rightarrow$& 58 && 62 &$\rightarrow$& 63
+\end{tabular}
+\end{center}
+\caption{Zig-zag order}
+\label{tab:zig-zag}
+\end{table}
+
+Note that the row and column indices refer to {\em frequency number} and not
+ pixel locations.
+The frequency numbers are defined independently of the memory organization of
+ the pixels.
+They have been written from top to bottom here to follow conventional notation,
+ despite the right-handed coordinate system Theora uses for pixel locations.
+
+Many implementations of the DCT operate `in-place'.
+That is, they return DCT coefficients in the same memory buffer that the
+ initial pixel values were stored in.
+Due to the right-handed coordinate system used for pixel locations in Theora,
+ one must note carefully how both pixel values and DCT coefficients are
+ organized in memory in such a system.
+
+DCT coefficient $(0,0)$ is called the \term{DC coefficient}.
+All the other coefficients are called \term{AC coefficients}.
+
+\subsection{Decoder Configuration}
+
+Decoder setup consists of configuration of the quantization matrices and the
+ Huffman codebooks for the DCT coefficients.
+The remainder of the decoding pipeline is not configurable.
+
+\subsubsection{Global Configuration}
+
+The global codec configuration consists of a few video related fields, such as
+ frame rate, frame size, picture size and offset, aspect ratio, color space,
+ pixel format, and a version number.
+The version number is divided into a major version, a minor version, amd a
+ minor revision number.
+For the format defined in this specification, these are `3', `2', and
+ `0', respectively, in reference to Theora's origin as a successor to the VP3.2
+ format.
+
+\subsubsection{Quantization Matrices}
+
+Theora allows up to 384 different quantization matrices to be defined, one for
+ each \term{quantization type} (intra or inter), \term{color plane}
+ ($Y'$, $C_b$, or $C_r$), and \term{quantization index}, \qi, which ranges from
+ zero to 63, inclusive.
+The quantization index generally represents a progressive range of quality
+ levels, from low quality near zero to high quality near 63.
+However, the interpretation is arbitrary, and it is possible, for example, to
+ partition the scale into two completely separate ranges with 32 levels each
+ that are meant to represent different classes of source material.
+
+Each quantization matrix is an $8\times 8$ matrix of 16-bit values, which is
+ used to quantize the output of the $8\times 8$ DCT.
+Quantization matrices are specified using three components: a
+ \term{base matrix} and two \term{scale values}.
+The first scale value is the \term{DC scale}, which is applied to the DC
+ component of the base matrix.
+The second scale value is the \term{AC scale}, which is applied to all the
+ other components of the base matrix.
+There are 64 DC scale values and 64 AC scale values, one for each \qi value.
+
+There are 64 elements in each base matrix, one for each DCT coefficient.
+They are stored in natural order.
+There is a separate set of base matrices for each quantization type and each
+ color plane, with up to 64 possible base matrices in each set, one for each
+ \qi value.
+Typically the bitstream contains matrices for only a sparse subset of the
+ possible \qi values, including at least the first and the last.
+The base matrices for the remainder of the \qi values are computed using linear
+ interpolation.
+This configuration allows the quantization matrices to approximate the complex,
+ non-linear processes of the human visual system as the \qi value varies.
+
+Finally, because the in-loop deblocking filter strength depends on the strength
+ of the quantization matrices defined in this header, a table of 64 \term{loop
+ filter limit values} is defined, one for each \qi value.
+
+The precise specification of how all of this information is decoded appears in
+ Section~REF.
+
+\subsubsection{Huffman Codebooks}
+
+Theora uses 80 configurable binary Huffman codes to represent the 32 tokens
+ used to encode DCT coefficients.
+Each of the 32 token values has a different semantic meaning and is used to
+ represent single coefficient values, zero runs, combinations of the two, and
+ \term{End-Of-Block markers}.
+
+The 80 codes are divided up into five groups of 16, with each group
+ corresponding to a set of DCT coefficient indices.
+The first group corresponds to the DC coefficient, while the remaining groups
+ correspond to different subsets of the AC coefficients.
+Within each frame, two pairs of 4-bit codebook indices are stored.
+The first pair selects which codebooks to use from the DC coefficient group for
+ the $Y'$ coefficients and the $C_b$ and $C_r$ coefficients.
+The second pair selects which codebooks to use from {\em all} of the AC
+ coefficient groups for the $Y'$ coefficients and the $C_b$ and $C_r$
+ coefficients.
+
+The precise specification of how the codebooks are decoded appears in
+ Section~REF.
+
+\subsection{High-Level Decode Process}
+
+\subsubsection{Decoder Setup}
+
+Before decoding can begin, a decoder MUST be initialized using the bitstream
+ headers corresponding to the stream to be decoded.
+Theora uses three header packets; all are required, in order, by this
+ specification.
+Once set up, decode may begin at any intra-frame packet---or even inter-frame
+ packets, provided the appropriate decoded reference frames have already been
+ decoded and cached---belonging to the Theora stream.
+In Theora I, all packets after the three initial headers are intra-frame or
+ inter-frame packets.
+
+The header packets are, in order, the identification header, the comment
+ header, and the setup header.
+
+\paragraph{Identification Header}
+
+The identification header identifies the stream as Theora, provides a version
+ number, and defines the characteristics of the video stream such as frame
+ size.
+A complete description of the identification header appears in
+ Section~\ref{sec:idheader}.
+
+\paragraph{Comment Header}
+
+The comment header includes user text comments (`tags') and a vendor string
+ for the application/library that produced the stream.
+The format of the comment header is the same as that used in the Vorbis I and
+ Speex codecs, with slight modifications due to the use of a different bit
+ packing mechanism.
+A complete description of how the comment header is coded appears in
+ Section~\ref{sec:commentheader}, along with a suggested set of tags.
+
+\paragraph{Setup Header}
+
+The setup header includes extensive codec setup information, including the
+ complete set of quantization matrices and Huffman codebooks needed to decode
+ the DCT coefficients.
+A complete description of the setup header appears in Section~REF.
+
+\subsubsection{Decode Procedure}
+
+The decoding and synthesis procedure for all video packets is fundamentally the
+ same, with some steps omitted for intra frames.
+\begin{enumerate}
+\item
+Decode packet type flag.
+\item
+Decode frame header.
+\item
+Decode coded block information (inter frames only).
+\item
+Decode macro block mode information (inter frames only).
+\item
+Decode motion vectors (inter frames only).
+\item
+Decode block-level \qi information.
+\item
+Decode DC coefficient for each coded block.
+\item
+Decode 1st AC coefficient for each coded block.
+\item
+Decode 2nd AC coefficient for each coded block.
+\item
+$\ldots$
+\item
+Decode 63rd AC coefficient for each coded block.
+\item Perform DC coefficient prediction.
+\item Reconstruct coded blocks.
+\item Copy uncoded bocks.
+\item Perform loop filtering.
+\end{enumerate}
+
+Note that clever rearrangement of the steps in this process is possible.
+As an example, in a memory-constrained environment, one can make multiple
+ passes through the DCT coefficients to avoid buffering them all in memory.
+On the first pass, the starting location of each coefficient is identified, and
+ then 64 separate get pointers are used to read in the 64 DCT coefficients
+ required to reconstruct each coded block in sequence.
+This operation produces entirely equivalent output and is naturally perfectly
+ legal.
+It may even be a benefit in non-memory-constrained environments due to a
+ reduced cache footprint.
+The decoder MUST be {\em entirely mathematically equivalent} to the
+ specification; it need not be a literal semantic implementation.
+
+Theora makes equivalence easy to check by defining all decoding operations in
+ terms of exact integer operations.
+No floating-point math is required, and in particular, the implementation of
+ the iDCT transform MUST be followed precisely.
+This prevents the decoder mismatch problem commonly associated with codecs that
+ provide a less rigorous transform specification.
+Such a mismatch problem would be devastating to Theora, since a single rounding
+ error in one frame could propagate throughout the entire succeeding frame due
+ to DC prediction.
+
+\paragraph{Packet Type Decode}
+
+Theora I uses four packet types.
+The first three packet types mark each of the three Theora headers described
+ above.
+The fourth packet type marks a video packet.
+All other packet types are reserved; packets marked with a reserved type should
+ be ignored.
+
+\paragraph{Frame Header Decode}
+
+The frame header contains some global information about the current frame.
+The first is the frame type field, which specifies if this is an intra frame or
+ an inter frame.
+Inter frames predict their contents from previously decoded reference frames.
+Intra frames can be independently decoded with no established reference frames.
+
+The next piece of information in the frame header is the list of \qi values
+ allowed in the frame.
+Theora allows between one and three different \qi values to be used in a single
+ frame, each of which selects a set of six quantization matrices, one for each
+ quantization type (inter or intra), and one for each color plane.
+The first \qi value is {\em always} used when dequantizing DC coefficients.
+The \qi value used when dequantizing AC coefficients, however, can vary from
+ block to block.
+VP3, in contrast, allowed just a single \qi value per frame for both the DC and
+ AC coefficients.
+
+\paragraph{Coded Block Information}
+
+This stage determines which blocks in the frame are coded and which are
+ uncoded.
+A \term{coded block list} is constructed which lists all the coded blocks in
+ coded order.
+For intra frames, every block is coded, and so no data needs to be read from
+ the packet.
+
+\paragraph{Macro Block Mode Information}
+
+For intra frames, every block is coded in INTRA mode, and this stage can be
+ skipped.
+In inter frames a \term{coded macro block list} is constructed from the coded
+ block list.
+Any macro block which has at least one of its luma blocks coded is considered
+ coded; all other macro blocks are uncoded, even if they contain coded chroma
+ blocks.
+A coding mode is decoded for each coded macro block, and assigned to all its
+ constituent coded blocks.
+All coded chroma blocks in uncoded macro blocks are assigned the INTER\_NOMV
+ coding mode.
+
+\paragraph{Motion Vectors}
+
+Intra frames are all coded entirely in INTRA mode, and so this stage can be
+ skipped.
+Some inter coding modes, however, require one or more motion vectors to be
+ specified for each macro block.
+These are decoded in this stage, and an appropriate motion vector is assigned
+ to each coded block in the macro block.
+
+\paragraph{Block-Level \qi Information}
+
+If a frame allows multiple \qi values, the \qi value assigned to each block is
+ decoded here.
+Frames that use only a single \qi value have nothing to decode.
+
+\paragraph{DCT Coefficients}
+
+Finally, the quantized DCT coefficients are decoded.
+A list of DCT coefficients in zig-zag order for a single block is represented
+ by a list of tokens.
+A token can take on one of 32 different values, each with a different semantic
+ meaning.
+A single token can represent a single DCT coefficient, a run of zero
+ coefficients within a single block, a combination of a run of zero
+ coefficients followed by a single non-zero coefficient, an
+ \term{End-Of-Block marker}, or a run of EOB markers.
+EOB markers signify that the remainder of the block is one long zero run.
+Unlike JPEG and MPEG, each block is not required to end with a special marker.
+If non-EOB tokens yield values for all 64 of the coefficients in a block, then
+ no EOB marker is needed.
+
+Each token is associated with a specific \term{token index} in a block.
+For single-coefficient tokens, this index is the zig-zag index of the token in
+ the block.
+For zero-run tokens, this index is the zig-zag index of the {\em first}
+ coefficient in the run.
+For combination tokens, the index is again the zig-zag index of the first
+ coefficient in the zero run.
+For EOB markers, which signify that the remainder of the block is one long zero
+ run, the index is the zig-zag index of the first zero coefficient in that run.
+For EOB runs, the token index is that of the first EOB marker in the run.
+Due to zero runs and EOB markers, a block does not have to have a token for
+ every zig-zag index.
+
+Tokens are grouped in the stream by token index, not by the block they
+ originate from.
+This means that for each zig-zag index in turn, the tokens with that index from
+ {\em all} the coded blocks are coded in coded block order.
+When decoding, a current token index is maintained for each coded block.
+This index is advanced by the number of coefficients that are added to the
+ block as each token is decoded.
+After fully decoding all the tokens with token index \ti, the current token
+ index of every coded block will be \ti or greater.
+
+If an EOB run of $n$ blocks is decoded at token index \ti, then it ends the
+ next $n$ blocks in coded block order whose current token index is equal to
+ \ti, but not greater.
+If there are fewer than $n$ blocks with a current token index of \ti, then the
+ decoder goes through the coded block list again from the start, ending blocks
+ with a current token index of $\ti+1$, and so on, until $n$ blocks have been
+ ended.
+
+Tokens are read by parsing a Huffman code that depends on \ti and the color
+ plane of the next coded block whose current token index is equal to \ti, but
+ not greater.
+The Huffman codebooks are selected on a per-frame basis from the 80 codebooks
+ defined in the setup header.
+Many tokens have a fixed number of \term{extra bits} associated with them.
+These bits are read from the packet immediately after the token is decoded.
+These are used to define things such as coefficient magnitude, sign, and the
+ length of runs.
+
+\paragraph{DC Prediction}
+
+After the coefficients for each block are decoded, the quantized DC value of
+ each block is adjusted based on the DC values of its neighbors.
+This adjustment is performed by scanning the blocks in raster order, not coded
+ block order.
+
+\paragraph{Reconstruction}
+
+Finally, using the coding mode, motion vector (if applicable), quantized
+ coefficient list, and \qi value defined for each block, all the coded blocks
+ are reconstructed.
+The DCT coefficients are dequantized, an inverse DCT transform is applied, and
+ the predictor is formed from the coding mode and motion vector and added to
+ the result.
+
+\paragraph{Loop Filtering}
+
+To complete the reconstructed frame, an in-loop deblocking filter is applied to
+ the edges of all coded blocks.
+
+\section{Video Formats}
+
+This section gives a precise description of the video formats that Theora is
+ capable of storing.
+The Theora bitstream is capable of handling video at any arbitrary resolution
+ up to $1048560\times 1048560$.
+Such video would require almost three terabytes of storage per frame for
+ uncompressed data, so compliant decoders MAY refuse to decode images with
+ sizes beyond their capabilities.
+%TODO: What MUST a "compliant" decoder accept?
+%TODO: What SHOULD a decoder use for an upper bound? (derive from total amount
+%TODO: of memory and memory bandwidth)
+%TODO: Any lower limits?
+%TODO: We really need hardware device profiles, but such things should be
+%TODO: developed with input from the hardware community.
+
+The remainder of this section talks about two specific aspects of the video
+ format: the color space and the pixel format.
+The first describes how color is represented and how to transform that color
+ representation into a device independent color space such as CIE $XYZ$ (1931).
+The second describes the various schemes for sampling the color values in time
+ and space.
+
+\subsection{Color Space Conventions}
+
+There are a large number of different color standards used in digital video.
+Since Theora is a lossy codec, it restricts itself to only a few of them to
+ simplify playback.
+Unlike the alternate method of describing all the parameters of the color
+ model, this allows a few dedicated routines for color conversion to be written
+ and heavily optimized in a decoder.
+More flexible conversion functions should instead be specified in an encoder,
+ where additional computational complexity is more easily tolerated.
+The color spaces were selected to give a fair representation of color standards
+ in use around the world today.
+Most of the standards that do not exactly match one of these can be converted
+ to one fairly easily.
+
+All Theora color spaces are $Y'C_bC_r$ color spaces with one luma channel and
+ two chroma channels.
+Each channel contains 8-bit discrete values in the range $0\ldots255$, which
+ represent non-linear gamma pre-corrected signals.
+The Theora identification header contains an 8-bit value that describes the
+ color space.
+This merely selects one of the color spaces available from an enumerated list.
+Currently, only two color spaces are defined, with a third possibility that
+ indicates the color space is "unknown".
+
+\subsection{Color Space Conversions and Parameters}
+\label{sec:color-xforms}
+
+The parameters which describe the conversions between each color space are
+ listed below.
+These are the parameters needed to map colors from the encoded $Y'C_bC_r$
+ representation to the device-independent color space CIE $XYZ$ (1931).
+These parameters define abstract mathematical conversion functions which are
+ infinitely precise.
+The accuracy and precision with which the conversions are performed in a real
+ system is determined by the quality of output desired and the available
+ processing power.
+Exact decoder output is defined by this specification only in the original
+ $Y'C_bC_r$ space.
+
+\begin{description}
+\item[$Y'C_bC_r$ to $Y'P_bP_r$:]
+\vspace{\baselineskip}\hfill
+
+This conversion takes 8-bit discrete values in the range $[0\ldots255]$ and
+ maps them to real values in the range $[0\ldots1]$ for Y and
+ $[-\frac{1}{2}\ldots\frac{1}{2}]$ for $P_b$ and $P_r$.
+Because some values may fall outside the offset and excursion defined for each
+ channel in the $Y'C_bC_r$ space, the results may fall outside these ranges in
+ $Y'P_bP_r$ space.
+No clamping should be done at this stage.
+
+\begin{eqnarray*}
+Y'_\mathrm{out} & = &
+ \frac{Y'_\mathrm{in}-\mathrm{Offset}_Y}{\mathrm{Excursion}_Y} \\
+P_b & = &
+ \frac{C_b-\mathrm{Offset}_{C_b}}{\mathrm{Excursion}_{C_b}} \\
+P_r & = &
+ \frac{C_r-\mathrm{Offset}_{C_r}}{\mathrm{Excursion}_{C_r}}
+\end{eqnarray*}
+
+Parameters: $\mathrm{Offset}_{Y,C_b,C_r}$, $\mathrm{Excursion}_{Y,C_b,C_r}$.
+
+\item[$Y'P_bP_r$ to $R'G'B'$:]
+\vspace{\baselineskip}\hfill
+
+This conversion takes the one luma and two chroma channel representation and
+ maps it to the non-linear $R'G'B'$ space used to drive actual output devices.
+Values should be clamped into the range $[0\ldots1]$ after this stage.
+
+\begin{eqnarray*}
+R' & = & Y'+2(1-K_r)P_r \\
+G' & = & Y'-2\frac{(1-K_b)K_b}{1-K_b-K_r}P_b-2\frac{(1-K_r)K_r}{1-K_b-K_r}P_r\\
+B' & = & Y'+2(1-K_b)P_b
+\end{eqnarray*}
+
+Parameters: $K_b,K_r$.
+
+\item[$R'G'B'$ to $RGB$ (Output device gamma correction):]
+\vspace{\baselineskip}\hfill
+
+This conversion takes the non-linear $R'G'B'$ voltage levels and maps them to
+ the linear light levels produced by the actual output device.
+Note that this conversion is only that of the output device, and its inverse is
+ {\em not} that used by the input device.
+Because a dim viewing environment is assumed in most television standards, the
+ overall gamma between the input and output devices is usually around $1.1$ to
+ $1.2$, and not a strict $1.0$.
+
+For calibration with actual output devices, the model
+\begin{displaymath}
+L=(E'+\Delta)^\gamma
+\end{displaymath}
+ should be used, with $\Delta$ the free parameter and $\gamma$ held fixed to
+ the value specified in this document.
+The conversion function presented here is an idealized version with $\Delta=0$.
+
+\begin{eqnarray*}
+R & = & R'^\gamma \\
+G & = & G'^\gamma \\
+B & = & B'^\gamma
+\end{eqnarray*}
+
+Parameters: $\gamma$.
+
+\item[$RGB$ to $R'G'B'$ (Input device gamma correction):]
+\vspace{\baselineskip}\hfill
+
+%TODO: Tag section as non-normative
+
+This conversion takes linear light levels and maps them to the non-linear
+ voltage levels used to drive the actual input device.
+This information is merely informative.
+It is not required for building a decoder or for converting between the various
+ formats and the actual output capabilities of a particular device.
+
+A linear segment is introduced on the low end to reduce noise in dark areas of
+ the image.
+The rest of the scale is adjusted so that the power segment of the curve
+ intersects the linear segment with the proper slope, and so that it still maps
+ 0 to 0 and 1 to 1.
+
+\begin{eqnarray*}
+R' & = & \left\{
+\begin{array}{ll}
+\alpha R, & 0\le R<\delta \\
+(1+\epsilon)R^\beta-\epsilon, & \delta\le R\le1
+\end{array}\right. \\
+G' & = & \left\{
+\begin{array}{ll}
+\alpha G, & 0\le G<\delta \\
+(1+\epsilon)G^\beta-\epsilon, & \delta\le G\le1
+\end{array}\right. \\
+B' & = & \left\{
+\begin{array}{ll}
+\alpha B, & 0\le B<\delta \\
+(1+\epsilon)B^\beta-\epsilon, & \delta\le B\le1
+\end{array}\right.
+\end{eqnarray*}
+
+Parameters: $\beta$, $\alpha$, $\delta$, $\epsilon$.
+
+\item[$RGB$ to CIE $XYZ$ (1931):]
+\vspace{\baselineskip}\hfill
+
+This conversion maps a device-dependent linear RGB space to the
+ device-independent linear CIE $XYZ$ space.
+The parameters are the CIE chromaticity coordinates of the three
+ primaries---red red, green, and blue---as well as the chromaticity coordinates
+ of the white point of the device.
+This is how hardware manufacturers and standards typically describe a
+ particular $RGB$ space.
+The math required to convert these parameters into a useful transformation
+ matrix is reproduced below.
+
+\begin{eqnarray*}
+F & = &
+\left[\begin{array}{ccc}
+\frac{x_r}{y_r} & \frac{x_g}{y_g} & \frac{x_b}{y_b} \\
+1 & 1 & 1 \\
+\frac{1-x_r-y_r}{y_r} & \frac{1-x_g-y_g}{y_g} & \frac{1-x_b-y_b}{y_b}
+\end{array}\right] \\
+\left[\begin{array}{c}
+s_r \\
+s_g \\
+s_b
+\end{array}\right] & = &
+F^{-1}\left[\begin{array}{c}
+\frac{x_w}{y_w} \\
+1 \\
+\frac{1-x_w-y_w}{y_w}
+\end{array}\right] \\
+\left[\begin{array}{c}
+X \\
+Y \\
+Z
+\end{array}\right] & = &
+F\left[\begin{array}{c}
+s_rR \\
+s_gG \\
+s_bB
+\end{array}\right]
+\end{eqnarray*}
+Parameters: $x_{r,g,b,q},y_{r,g,b,w}$.
+
+\end{description}
+
+\subsection{Available Color Spaces}
+\label{sec:colorspaces}
+
+These are the color spaces currently defined for use by Theora video.
+Each one has a short name, with which it is referred to in this document, and
+ a more detailed specification of the standards from which its parameters are
+ derived.
+Some standards do not specify all the parameters necessary.
+For these unspecified parameters, this document serves as the definition of
+ what should be used when encoding or decoding Theora video.
+
+\subsubsection{Rec. 470M (Rec. ITU-R BT.470-6 System M/NTSC with Rec. ITU-R
+ BT.601-5)}
+\label{sec:470m}
+
+This color space is used by broadcast television and DVDs in much of the
+ Americas, Japan, Korea, and the Union of Myanmar \cite{rec470}.
+This color space may also be used for System M/PAL (Brazil), with an
+ appropriate conversion supplied by the encoder to compensate for the
+ different gamma value.
+See Section~\ref{sec:470bg} for an appropriate gamma value to assume for M/PAL
+ input.
+
+In the US, studio monitors are adjusted to a D65 white point
+ ($x_w,y_w=0.313,0.329$).
+In Japan, studio monitors are adjusted to a D white of 9300K
+ ($x_w,y_w=0.285,0.293$).
+
+Rec. 470 does not specify a digital encoding of the color signals.
+For Theora, Rec. ITU-R BT.601-5 \cite{rec601} is used, starting from the
+ $R'G'B'$ signals specified by Rec. 470.
+
+Rec. 470 does not specify an input gamma function.
+For Theora, the Rec. 709 \cite{rec709} input function is assumed.
+This is the same as that specified by SMPTE 170M \cite{smpte170m}, which claims
+ to reflect modern practice in the creation of NTSC signals circa 1994.
+
+The parameters for all the color transformations defined in
+ Section~\ref{sec:color-xforms} are given in Table~\ref{tab:470m}.
+
+\begin{table}[htb]
+\begin{eqnarray*}
+\mathrm{Offset}_{Y,C_b,C_r} & = & (16, 128, 128) \\
+\mathrm{Excursion}_{Y,C_b,C_r} & = & (219, 224, 224) \\
+K_r & = & 0.299 \\
+K_b & = & 0.114 \\
+\gamma & = & 2.2 \\
+\beta & = & 0.45 \\
+\alpha & = & 4.5 \\
+\delta & = & 0.018 \\
+\epsilon & = & 0.099 \\
+x_r,y_r & = & 0.67, 0.33 \\
+x_g,y_g & = & 0.21, 0.71 \\
+x_b,y_b & = & 0.14, 0.08 \\
+\mathrm{(Illuminant C)\ }x_w,y_w & = & 0.310, 0.316 \\
+\end{eqnarray*}
+\caption{Rec. 470M Parameters}
+\label{tab:470m}
+\end{table}
+
+\subsubsection{Rec. 470BG (Rec. ITU-R BT.470-6 Systems B and G with Rec. ITU-R
+ BT.601-5)}
+\label{sec:470bg}
+
+This color space is used by the PAL and SECAM systems in much of the rest of
+ the world \cite{rec470}
+This can be used directly by systems (B, B1, D, D1, G, H, I, K, N)/PAL and (B,
+ D, G, H, K, K1, L)/SECAM.
+
+Note that the Rec. 470BG chromaticity values are different from those specified
+ in Rec. 470M.
+When PAL and SECAM systems were first designed, they were based upon the same
+ primaries as NTSC.
+However, as methods of making color picture tubes have changed, the primaries
+ used have changed as well.
+The U.S. recommends using correction circuitry to approximate the existing,
+ standard NTSC primaries.
+Current PAL and SECAM systems have standardized on primaries in accord with
+ more recent technology.
+
+Rec. 470 provisionally permits the use of the NTSC chromaticity values (given
+ in Section~\ref{sec:470m}) with legacy PAL and SECAM equipment.
+In Theora, material must be decoded assuming the new PAL and SECAM primaries.
+Material intended for display on old legacy devices should be converted by the
+ decoder.
+
+The official Rec. 470BG specifies a gamma value of $\gamma=2.8$.
+However, in practice this value is unrealistically high \cite{Poyn97}.
+Rec. 470BG states that the overall system gamma should be approximately
+ $\gamma\beta=1.2$.
+Since most cameras pre-correct with a gamma value of $\beta=0.45$,
+ this suggests an output device gamma of approximately $\gamma=2.67$.
+This is the value recommended for use with PAL systems in Theora.
+
+Rec. 470 does not specify a digital encoding of the color signals.
+For Theora, Rec. ITU-R BT.601-5 \cite{rec601} is used, starting from the
+ $R'G'B'$ signals specified by Rec. 470.
+
+Rec. 470 does not specify an input gamma function.
+For Theora, the Rec 709 \cite{rec709} input function is assumed.
+
+The parameters for all the color transformations defined in
+ Section~\ref{sec:color-xforms} are given in Table~\ref{tab:470bg}.
+
+\begin{table}[htb]
+\begin{eqnarray*}
+\mathrm{Offset}_{Y,C_b,C_r} & = & (16, 128, 128) \\
+\mathrm{Excursion}_{Y,C_b,C_r} & = & (219, 224, 224) \\
+K_r & = & 0.299 \\
+K_b & = & 0.114 \\
+\gamma & = & 2.67 \\
+\beta & = & 0.45 \\
+\alpha & = & 4.5 \\
+\delta & = & 0.018 \\
+\epsilon & = & 0.099 \\
+x_r,y_r & = & 0.64, 0.33 \\
+x_g,y_g & = & 0.29, 0.60 \\
+x_b,y_b & = & 0.15, 0.06 \\
+\mathrm{(D65)\ }x_w,y_w & = & 0.313, 0.329 \\
+\end{eqnarray*}
+\caption{Rec. 470BG Parameters}
+\label{tab:470bg}
+\end{table}
+
+\subsection{Pixel Formats}
+
+\section{Bitpacking Convention}
+\label{sec:bitpacking}
+
+\subsection{Overview}
+
+The Theora codec uses relatively unstructured raw packets containing
+ binary integer fields of arbitrary width.
+Logically, each packet is a bitstream in which bits are written one-by-one by
+ the encoder and then read one-by-one in the same order by the decoder.
+Most current binary storage arrangements group bits into a native storage unit
+ of eight bits (octets), sixteen bits, thirty-two bits, or less commonly other
+ fixed sizes.
+The Theora bitpacking convention specifies the correct mapping of the logical
+ packet bitstream into an actual representation in fixed-width units.
+
+\subsubsection{Octets and Bytes}
+
+In most contemporary architectures, a `byte' is synonymous with an `octect',
+ that is, eight bits.
+For purposes of the bitpacking convention, a byte implies the smallest native
+ integer storage representation offered by a platform.
+Modern file systems invariably offer bytes as the fundamental atom of storage.
+
+The most ubiquitous architectures today consider a `byte' to be an octet.
+Note, however, that the Theora bitpacking convention is still well defined for
+ any native byte size; an implementation can use the native bit-width of a
+ given storage system.
+This document assumes that a byte is one octet for purposes of example only.
+
+\subsubsection{Words and Byte Order}
+
+A `word' is an integer size that is a grouped multiple of the byte size.
+Most architectures consider a word to be a group of two, four, or eight bytes.
+Each byte in the word can be ranked by order of `significance', e.g. the
+ significance of the bits in each byte when storing a binary integer in the
+ word.
+Several byte orderings are possible in a word.
+The common ones are
+\begin{itemize}
+\item{Big-endian:}
+in which the most significant byte comes first, e.g. 3-2-1-0,
+\item{Little-endian:}
+in which the least significant byte comes first, e.g. 0-1-2-3, and
+\item{Mixed-endian:}
+one of the less-common orderings that cannot be put into the above two
+ categories, e.g. 3-1-2-0 or 0-2-1-3.
+\end{itemize}
+
+The Theora bitpacking convention specifies storage and bitstream manipulation
+ at the byte, not word, level.
+Thus host word ordering is of a concern only during optimization, when writing
+ code that operates on a word of storage at a time rather than a byte.
+Logically, bytes are always encoded and decoded in order from byte zero through
+ byte $n$.
+
+\subsubsection{Bit Order}
+
+A byte has a well-defined `least significant' bit (LSb), which is the only bit
+ set when the byte is storing the two's complement integer value $+1$.
+A byte's `most significant' bit (MSb) is at the opposite end.
+Bits in a byte are numbered from zero at the LSb to $n$ for the MSb, where
+ $n=7$ in an octet.
+
+\subsection{Coding Bits into Bytes}
+
+The Theora codec needs to encode arbitrary bit-width integers from zero to 32
+ bits wide into packets.
+These integer fields are not aligned to the boundaries of the byte
+ representation; the next field is read at the bit position immediately
+ after the end of the previous field.
+
+The decoder logically unpacks integers by first reading the MSb of a binary
+ integer from the logical bitstream, followed by the next most significant
+ bit, etc., until the required number of bits have been read.
+When unpacking the bytes into bits, the decoder begins by reading the MSb of
+ the integer to be read from the most significant unread bit position of the
+ source byte, followed by the next-most significant bit position of the
+ destination integer, and so on up to the requested number of bits.
+Note that this differs from the Vorbis I codec, which
+ begins decoding with the LSb of the source integer, reading it from the
+ LSb of the source byte.
+When all the bits of the current source byte are read, decoding continues with
+ the MSb of the next byte.
+Any unfilled bits in the last byte of the packet MUST be cleared to zero by the
+ encoder.
+
+\subsubsection{Signedness}
+
+The binary integers decoded by the above process may be either signed or
+ unsigned.
+This varies from integer to integer, and this specification
+ indicates how each value should be interpreted as it is read.
+That is, depending on context, the three bit binary pattern `b111' can be taken
+ to represent either `$7$' as an unsigned integer or `$-1$' as a signed, two's
+ complement integer.
+
+\subsubsection{Encoding Example}
+
+The following example shows the state of an (8-bit) byte stream after several
+ binary integers are encoded, including the location of the put pointer for the
+ next bit to write to and the total length of the stream in bytes.
+
+Encode the 4 bit unsigned integer value `12' (b1100) into an empty byte stream.
+
+\begin{tabular}{r|ccccccccl}
+\multicolumn{1}{r}{}& &&&&$\downarrow$&&&& \\
+ & 7 & 6 & 5 & 4 & 3 & 2 & 1 & 0 & \\\cline{1-9}
+byte 0 & \textbf{1} & \textbf{1} & \textbf{0} & \textbf{0} &
+ 0 & 0 & 0 & 0 & $\leftarrow$ \\
+byte 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \\
+byte 2 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \\
+byte 3 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \\
+\multicolumn{1}{c|}{$\vdots$}&\multicolumn{8}{c}{$\vdots$}& \\
+byte $n$ & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 &
+byte stream length: 1 byte
+\end{tabular}
+\vspace{\baselineskip}
+
+Continue by encoding the 3 bit signed integer value `-1' (b111).
+
+\begin{tabular}{r|ccccccccl}
+\multicolumn{1}{r}{} &&&&&&&&$\downarrow$& \\
+ & 7 & 6 & 5 & 4 & 3 & 2 & 1 & 0 & \\\cline{1-9}
+byte 0 & \textbf{1} & \textbf{1} & \textbf{0} & \textbf{0} &
+ \textbf{1} & \textbf{1} & \textbf{1} & 0 & $\leftarrow$ \\
+byte 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \\
+byte 2 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \\
+byte 3 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \\
+\multicolumn{1}{c|}{$\vdots$}&\multicolumn{8}{c}{$\vdots$}& \\
+byte $n$ & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 &
+byte stream length: 1 byte
+\end{tabular}
+\vspace{\baselineskip}
+
+Continue by encoding the 7 bit integer value `17' (b0010001).
+
+\begin{tabular}{r|ccccccccl}
+\multicolumn{1}{r}{} &&&&&&&$\downarrow$&& \\
+ & 7 & 6 & 5 & 4 & 3 & 2 & 1 & 0 & \\\cline{1-9}
+byte 0 & \textbf{1} & \textbf{1} & \textbf{0} & \textbf{0} &
+ \textbf{1} & \textbf{1} & \textbf{1} & \textbf{0} & \\
+byte 1 & \textbf{0} & \textbf{1} & \textbf{0} & \textbf{0} &
+ \textbf{0} & \textbf{1} & 0 & 0 & $\leftarrow$ \\
+byte 2 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \\
+byte 3 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \\
+\multicolumn{1}{c|}{$\vdots$}&\multicolumn{8}{c}{$\vdots$}& \\
+byte $n$ & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 &
+byte stream length: 2 bytes
+\end{tabular}
+\vspace{\baselineskip}
+
+Continue by encoding the 13 bit integer value `6969' (b11011 00111001).
+
+\begin{tabular}{r|ccccccccl}
+\multicolumn{1}{r}{} &&&&$\downarrow$&&&&& \\
+ & 7 & 6 & 5 & 4 & 3 & 2 & 1 & 0 & \\\cline{1-9}
+byte 0 & \textbf{1} & \textbf{1} & \textbf{0} & \textbf{0} &
+ \textbf{1} & \textbf{1} & \textbf{1} & \textbf{0} & \\
+byte 1 & \textbf{0} & \textbf{1} & \textbf{0} & \textbf{0} &
+ \textbf{0} & \textbf{1} & \textbf{1} & \textbf{1} & \\
+byte 2 & \textbf{0} & \textbf{1} & \textbf{1} & \textbf{0} &
+ \textbf{0} & \textbf{1} & \textbf{1} & \textbf{1} & \\
+byte 3 & \textbf{0} & \textbf{0} & \textbf{1} &
+ 0 & 0 & 0 & 0 & 0 & $\leftarrow$ \\
+\multicolumn{1}{c|}{$\vdots$}&\multicolumn{8}{c}{$\vdots$}& \\
+byte $n$ & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 &
+byte stream length: 4 bytes
+\end{tabular}
+\vspace{\baselineskip}
+
+\subsubsection{Decoding Example}
+
+The following example shows the state of the (8-bit) byte stream encoded in the
+ previous example after several binary integers are decoded, including the
+ location of the get pointer for the next bit to read.
+
+Read a two bit unsigned integer from the example encoded above.
+
+\begin{tabular}{r|ccccccccl}
+\multicolumn{1}{r}{} &&&$\downarrow$&&&&&& \\
+ & 7 & 6 & 5 & 4 & 3 & 2 & 1 & 0 & \\\cline{1-9}
+byte 0 & \textbf{1} & \textbf{1} & 0 & 0 & 1 & 1 & 1 & 0 & $\leftarrow$ \\
+byte 1 & 0 & 1 & 0 & 0 & 0 & 1 & 1 & 1 & \\
+byte 2 & 0 & 1 & 1 & 0 & 0 & 1 & 1 & 1 & \\
+byte 3 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 &
+byte stream length: 4 bytes
+\end{tabular}
+\vspace{\baselineskip}
+
+Value read: 3 (b11).
+
+Read another two bit unsigned integer from the example encoded above.
+
+\begin{tabular}{r|ccccccccl}
+\multicolumn{1}{r}{} &&&&&$\downarrow$&&&& \\
+ & 7 & 6 & 5 & 4 & 3 & 2 & 1 & 0 & \\\cline{1-9}
+byte 0 & \textbf{1} & \textbf{1} & \textbf{0} & \textbf{0} &
+ 1 & 1 & 1 & 0 & $\leftarrow$ \\
+byte 1 & 0 & 1 & 0 & 0 & 0 & 1 & 1 & 1 & \\
+byte 2 & 0 & 1 & 1 & 0 & 0 & 1 & 1 & 1 & \\
+byte 3 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 &
+byte stream length: 4 bytes
+\end{tabular}
+\vspace{\baselineskip}
+
+Value read: 0 (b00).
+
+Two things are worth noting here.
+\begin{itemize}
+\item
+Although these four bits were originally written as a single four-bit integer,
+ reading some other combination of bit-widths from the bitstream is well
+ defined.
+No artificial alignment boundaries are maintained in the bitstream.
+\item
+The first value is the integer `$3$' only because the context stated we were
+ reading an unsigned integer.
+Had the context stated we were reading a signed integer, the returned value
+ would have been the integer `$-1$'.
+\end{itemize}
+
+\subsubsection{End-of-Packet Alignment}
+
+The typical use of bitpacking is to produce many independent byte-aligned
+ packets which are embedded into a larger byte-aligned container structure,
+ such as an Ogg transport bitstream.
+Externally, each bitstream encoded as a byte stream MUST begin and end on a
+ byte boundary.
+Often, the encoded packet bitstream is not an integer number of bytes, and so
+ there is unused space in the last byte of a packet.
+
+Unused space in the last byte of a packet is always zeroed during the encoding
+ process.
+Thus, should this unused space be read, it will return binary zeroes.
+There is no marker pattern or stuffing bits that will allow the decoder to
+ obtain the exact size, in bits, of the original bitstream.
+This knowledge is not required for decoding.
+
+Attempting to read past the end of an encoded packet results in an
+ `end-of-packet' condition.
+Any further read operations after an `end-of-packet' condition shall also
+ return `end-of-packet'.
+Unlike Vorbis, Theora does not use truncated packets as a normal mode of
+ operation.
+Therefore if a decoder encounters the `end-of-packet' condition during normal
+ decoding, it may attempt to use the bits that were read to recover as much of
+ encoded data as possible, signal a warning or error, or both.
+
+\subsubsection{Reading Zero Bit Integers}
+
+Reading a zero bit integer returns the value `$0$' and does not increment
+ the stream pointer.
+Reading to the end of the packet, but not past the end, so that an
+ `end-of-packet' condition is not triggered, and then reading a zero bit
+ integer shall succeed, returning `$0$', and not trigger an `end-of-packet'
+ condition.
+Reading a zero bit integer after a previous read sets the `end-of-packet'
+ condition shall fail, also returning `end-of-packet'.
+
+\section{Bitstream Headers}
+\label{sec:headers}
+
+A Theora bitstream begins with three header packets.
+The header packets are, in order, the identification header, the comment
+ header, and the setup header.
+All are required for decode compliance.
+An end-of-packet condition encountered while decoding the identification or
+ setup header packets renders the stream undecodable.
+An end-of-packet condition encountered while decode the comment header is a
+ non-fatal error condition, and MAY be ignored by a decoder.
+
+\subsection{Common Header Decode}
+
+Each header packet begins with the same header fields:
+
+\begin{enumerate}
+\item{\bitvar{packet\_type}:} 8 bit unsigned integer.
+\item{0x74, 0x68, 0x65, 0x6F, 0x72, 0x61:}
+The characters `t', `h', `e', `o', `r', and `a' as 8 bit unsigned integers.
+\end{enumerate}
+
+Decode continues according to packet type.
+The identification header is type 0x80, the comment header is type 0x81, and
+ the setup header is type 0x82.
+These types all have their high bit set, as a packet with its first bit unset
+ is a video data packet.
+These packets must occur in the order: identification, comment, setup.
+
+\subsection{Identification Header}
+\label{sec:idheader}
+
+The identification header is a short header with only a few fields used to
+ declare the stream definitively as Theora and provide detailed information
+ about the format of the fully decoded video data.
+The identification header is coded as follows:
+
+\begin{enumerate}
+\item{\bitvar{version\_major}:} 8-bit unsigned integer.
+\item{\bitvar{version\_minor}:} 8-bit unsigned integer.
+\item{\bitvar{version\_revision}:} 8-bit unsigned integer.
+\item{\bitvar{frame\_mb\_width}:} 16-bit unsigned integer.
+\item{\bitvar{frame\_mb\_height}:} 16-bit unsigned integer.
+\item{\bitvar{picture\_width}:} 24-bit unsigned integer.
+\item{\bitvar{picture\_height}:} 24-bit unsigned integer.
+\item{\bitvar{picture\_x\_offset}:} 8-bit unsigned integer.
+\item{\bitvar{picture\_y\_offset}:} 8-bit unsigned integer.
+\item{\bitvar{frame\_rate\_numerator}:} 32-bit unsigned integer.
+\item{\bitvar{frame\_rate\_denominator}:} 32-bit unsigned integer.
+\item{\bitvar{pixel\_aspect\_numerator}:} 24-bit unsigned integer.
+\item{\bitvar{pixel\_aspect\_denominator}:} 24-bit unsigned integer.
+\item{\bitvar{color\_space}:} 8-bit unsigned integer.
+\item{\bitvar{nominal\_bitrate}:} 24-bit unsigned integer.
+\item{\bitvar{quality}:} 6-bit unsigned integer.
+\item{\bitvar{keyframe\_granule\_shift}:} 5-bit unsigned integer.
+\item{\bitvar{pixel\_format}:} 2-bit unsigned integer.
+\item{\bitvar{reserved}:} 3-bit unsigned integer.
+\end{enumerate}
+
+\bitvar{version\_major}, \bitvar{version\_minor}, and
+ \bitvar{version\_revision} MUST be $3$, $2$, and $0$, respectively in order
+ to be compatible with this document.
+
+Both \bitvar{frame\_mb\_width} and \bitvar{frame\_mb\_height} MUST be greater
+ than zero.
+Each specifies the width of the coded video frame in macro blocks.
+The actual width of the frame in pixels is $16*\bitvar{frame\_mb\_width}$, and
+ the height in pixels is $16*\bitvar{frame\_mb\_height}$.
+The size of the displayable picture within this coded frame in pixels is
+ \bitvar{picture\_width} by \bitvar{picture\_height}.
+The lower-left corner of the displayable picture is located in position
+ $(\bitvar{picture\_x\_offset},$ $\bitvar{picture\_y\_offset})$.
+These MUST be less than the frame width and frame height in pixels,
+ respectively.
+In addition, $\bitvar{picture\_x\_offset}+\bitvar{picture\_width}$ and
+ $\bitvar{picture\_y\_offset}+\bitvar{picture\_height}$ MUST be less than the
+ frame width and frame height in pixels, respectively.
+
+If any of these checks fail, the stream is rendered undecodable.
+
+Theora is a fixed-frame rate video codec.
+Frames are sampled at the constant rate of
+ $\frac{\bitvar{frame\_rate\_numerator}}{\bitvar{frame\_rate\_denominator}}$
+ frames per second.
+Both of these fields MUST be greater than zero, or the stream is rendered
+ undecodable.
+
+The aspect ratio of the pixels within a frame, defined as the ratio of the
+ physical width of the pixel to its physical height, is specified by the ratio
+ $\bitvar{pixel\_aspect\_numerator}:\bitvar{pixel\_aspect\_denominator}$.
+Either of these fields MAY be zero, in which case the pixel aspect ratio
+ defaults to $1:1$.
+
+The \bitvar{nominal\_bitrate} field is used only as a hint.
+For pure VBR streams, this value may be considerably off.
+The field MAY be set to zero to indicate that the encoder did not care to
+ speculate.
+%TODO: Quality values... this is also a hint, but of what?
+%TODO: ideally, it should be semantically distinct from the \qi values.
+
+The \bitvar{keyframe\_granule\_shift} is used to partition the granule
+ position associated with each packet into two different parts.
+The frame number of the last keyframe, starting from zero, is stored in the
+ upper $64-\bitvar{keyframe\_granule\_shift}$ bits, while the lower
+ \bitvar{keyframe\_granule\_shift} bits contain the number of frames since the
+ last keyframe.
+Complete details on the granule position mapping are specified in Section~REF.
+
+The \bitvar{color\_space} field contains a value from an enumerated list of
+ the available color spaces, given in Table~\ref{tab:colorspaces}.
+The `Undefined' value indicates that color space information was not
+ available to the encoder.
+It MAY be specified by the application via an external means.
+If a reserved value is given, a decoder MAY refuse to decode the stream.
+
+\begin{table}[htb]
+\begin{center}
+\begin{tabular*}{215pt}{cl@{\extracolsep{\fill}}}\toprule
+Value & Color Space \\\midrule
+$0$ & Undefined. \\
+$1$ & Rec. 470M (see Section~\ref{sec:470m}). \\
+$2$ & Rec. 470BG (see Section~\ref{sec:470bg}). \\
+$3\ldots255$ & Reserved. \\\bottomrule
+\end{tabular*}
+\end{center}
+\caption{Enumerated List of Color Spaces}
+\label{tab:colorspaces}
+\end{table}
+
+The \bitvar{pixel\_format} field contains a value from an enumerated list of
+ the available pixel formats, given in Table~\ref{tab:pixel-formats}.
+If the reserved value $1$ is given, the stream is rendered undecodable.
+
+\begin{table}[htb]
+\begin{center}
+\begin{tabular*}{215pt}{cl@{\extracolsep{\fill}}}\toprule
+Value & Pixel Format \\\midrule
+$0$ & 4:2:0 (see Section~REF). \\
+$1$ & Reserved. \\
+$2$ & 4:2:2 (see Section~REF). \\
+$3$ & 4:4:4 (see Section~REF). \\\bottomrule
+\end{tabular*}
+\end{center}
+\caption{Enumerated List of Pixel Formats}
+\label{tab:pixel-formats}
+\end{table}
+
+Finally, the bits in the \bitvar{reserved} field MUST be zero, or the stream
+ is rendered undecodable.
+
+\subsection{Comment Header}
+\label{sec:commentheader}
+
+The Theora comment header is the second of three header packets that begin a
+ Theora stream.
+It is meant for short text comments, not aribtrary metadata; arbitrary metadata
+ belongs in a separate logical stream that provides greater structure and
+ machine parseability.
+
+The comment field is meant to be used much like someone jotting a quick note on
+ the bottom of a CDR.
+It should be a little information to remember the disc by and explain it to
+ others; a short, to-the-point text note taht need not only be a couple words,
+ but isn't going to be more than a short paragraph.
+The essentials, in other words, whatever they turn out to be, e.g.:
+
+%TODO: Example
+
+\subsubsection{Comment Header Coding}
+
+The comment header is stored as a logical list of eight-bit clean vectors; the
+ number of vectors is bounded at $2^{32}-1$ and the length of each vector is
+ limited to $2^{32}-1$ bytes.
+The vector length is encoded; the vector contents themselves are not null
+ terminated.
+In addition to the vector list, there is a single vector for a vendor name,
+ also eight-bit clean with a length encoded in 32 bits.
+%TODO: The 1.0 release of libtheora sets the vendor string to ...
+
+The comment header is decoded as follows:
+\begin{enumerate}
+\item{\bitvar{vendor\_length\_0}:} 8-bit unsigned integer.
+\item{\bitvar{vendor\_length\_1}:} 8-bit unsigned integer.
+\item{\bitvar{vendor\_length\_2}:} 8-bit unsigned integer.
+\item{\bitvar{vendor\_length\_3}:} 8-bit unsigned integer.
+\item{\bitvar{vendor\_string}:} \bitvar{vendor\_length} 8-bit unsigned
+ integers.
+\item{\bitvar{user\_comment\_list\_length\_0}:} 8-bit unsigned integer.
+\item{\bitvar{user\_comment\_list\_length\_1}:} 8-bit unsigned integer.
+\item{\bitvar{user\_comment\_list\_length\_2}:} 8-bit unsigned integer.
+\item{\bitvar{user\_comment\_list\_length\_3}:} 8-bit unsigned integer.
+\item{\bitvar{user\_comment\_list}:} \bitvar{user\_comment\_list\_length}
+ user comments.
+\end{enumerate}
+
+Here \bitvar{vendor\_length} and \bitvar{user\_comment\_list\_length} are
+ formed by arranging their constituent octets in little-endian order.
+\begin{eqnarray*}
+\bitvar{vendor\_length} & = &
+\bitvar{vendor\_length\_0} + \\
+&& \bitvar{vendor\_length\_1}*2^8 + \\
+&& \bitvar{vendor\_length\_2}*2^{16} + \\
+&& \bitvar{vendor\_length\_3}*2^{32} \\
+\bitvar{user\_comment\_list\_length} & = &
+\bitvar{user\_comment\_list\_length\_0} + \\
+&& \bitvar{user\_comment\_list\_length\_1}*2^8 + \\
+&& \bitvar{user\_comment\_list\_length\_2}*2^{16} + \\
+&& \bitvar{user\_comment\_list\_length\_3}*2^{32}
+\end{eqnarray*}
+This construction is used so that on platforms with 8-bit bytes, the memory
+ organization of the comment header is identical with that of Vorbis I,
+ allowing for common parsing code despite the different bit packing
+ conventions.
+
+Each user comment is similarly decoded as:
+\begin{enumerate}
+\item{$\bitvar{comment\_length\_0}[i]$:} 8-bit unsigned integer.
+\item{$\bitvar{comment\_length\_1}[i]$:} 8-bit unsigned integer.
+\item{$\bitvar{comment\_length\_2}[i]$:} 8-bit unsigned integer.
+\item{$\bitvar{comment\_length\_3}[i]$:} 8-bit unsigned integer.
+\item{$\bitvar{comment\_string}[i]$:} $\bitvar{comment\_length}[i]$ 8-bit
+ unsigned integers.
+\end{enumerate}
+
+Again, $\bitvar{comment\_length}[i]$ is formed as follows:
+\begin{eqnarray*}
+\bitvar{comment\_length}[i] & = &
+\bitvar{comment\_length\_0}[i] + \\
+&& \bitvar{comment\_length\_1}[i]*2^8 + \\
+&& \bitvar{comment\_length\_2}[i]*2^{16} + \\
+&& \bitvar{comment\_length\_3}[i]*2^{32} \\
+\end{eqnarray*}
+
+The comment header comprises the entirety of the second header packet.
+Unlike the first header packet, it is not generally the only packet on the
+ second page and may span multiple pages.
+The length of the comment header packet is (practically) unbounded.
+The comment header packet is not optional; it must be present in the stream
+ even if it is logically empty.
+
+\subsubsection{User Comment Format}
+
+The user comment vectors are structured similarly to a UNIX environment
+ variable.
+That is, comment fields consist of a field name and a corresponding value and
+ look like:
+\begin{center}
+\begin{tabular}{rcl}
+$\bitvar{comment\_string}[0]$ & = & ``TITLE=the look of Theora" \\
+$\bitvar{comment\_string}[1]$ & = & ``DIRECTOR=me"
+\end{tabular}
+\end{center}
+
+The field name is case-insensitive and MUST consist of ASCII characters 0x20
+ through 0x7D, 0x3D (`=') excluded.
+ASCII 0x41 through 0x5A inclusive (characters `A'--`Z') are to be considered
+ equivalent to ASCII 0x61 through 0x7A inclusive (characters `a'--`z').
+%TODO: Is an empty field-name permitted?
+
+The field name is immediately followed by ASCII 0x3D (`='); this equals sign is
+ used to terminate the field name.
+
+The data immediately after 0x3D until the end of the vector is the eight-bit
+ clean value of the field contents encoded as a UTF-8 string.
+%TODO: Cite UTF-8 standard.
+
+Field names MUST not be `internationalized'; this is a concession to
+ simplicity, not an attempt to exclude the majority of the world that doesn't
+ speak English.
+Applications MAY wish to present internationalized versions of the standard
+ field names listed below to the user, but they are not to be stored in the
+ bitstream.
+Field {\em contents}, however, use the UTF-8 character encoding to allow easy
+ representation of any language.
+
+Individual `vendors' MAY use non-standard field names within reason.
+The proper use of comment fields as human-readable notes has already been
+ explained.
+Abuse will be discouraged.
+
+There is no vendor-specific prefix to `non-standard' field names.
+Vendors SHOULD make some effort to avoid arbitrarily polluting the common
+ namespace.
+We will generally collect the more useful tags here to help with
+ standardization.
+
+Field names are not restricted to occur only once within a comment header.
+%TODO: Example
+
+\paragraph{Field Names}
+
+Below is a proposed, minimal list of standard field names with a description of
+ their intended use.
+No field names are mandatory; a comment header may contain one or more, all, or
+ none of the names in this list.
+
+\begin{description}
+\item{TITLE:} Video name.
+%TODO: Complete list
+\end{description}
+
+\appendix
+
+\section{Ogg Bitstream Encapsulation}
+\label{app:oggencapsulation}
+
+\subsection{Overview}
+
+This document specifies the embedding or encapsulation of Theora packets
+ in an Ogg transport stream.
+
+Ogg is a stream oriented wrapper for coded, linear time-based data.
+It provides syncronization, multiplexing, framing, error detection and
+ seeking landmarks for the decoder and complements the raw packet format
+ used by the Theora codec.
+
+This document assumes familiarity with the details of the Ogg standard.
+The Xiph.org documentation provides an overview of the Ogg transport stream
+ format \cite{oggstream} and a detailed description \cite{oggframe}.
+%TODO: Maybe we should just put these links in-line, instead of as references.
+The format is also defined in RFC~3533 \cite{rfc3533}.
+While Theora packets can be embedded in a wide variety of media
+ containers and streaming mechanisms, the Xiph.org Foundation
+ recommends Ogg as the native format for Theora video in file-oriented
+ storage and transmission contexts.
+
+\subsubsection{MIME type}
+
+The correct MIME type of any Ogg file is {\tt application/ogg}.
+Outside of an encapsulation, the mime type {\tt video/x-theora} may
+ be used to refer specifically to the Theora compressed video stream.
+
+\subsection{Embedding in a logical bitstream}
+
+Ogg separates a {\em logical bitstream} consisting of the framing of
+ a particular sequence of packets and complete within itself from
+ the {\em physical bitstream} which may consist either of a single
+ logical bitstream or a number of logical bitstreams multiplexed
+ together.
+This section specifies the embedding of Theora packets in a logical Ogg
+ bitstream.
+The mapping of Ogg Theora logical bitstreams into a multiplexed physical Ogg
+ stream is described in the next section.
+
+\subsubsection{Headers}
+
+The initial info header packet appears by itself in a single Ogg page.
+This page defines the start of the logical stream and MUST have
+ the `beginning of stream' flag set.
+
+The second and third header packets (metadata comments and decoder
+ setup data) can together span one or more Ogg pages.
+If there are additional non-normative header packets, they MUST be
+ included in this sequence of pages as well.
+The comment header packet MUST begin the second Ogg page in the logical
+ bitstream, and there MUST be a page break between the last header
+ packet and the first frame data packet.
+
+These two page break requirements facilitate stream identification and
+ simplify header acquisition for seeking and live streaming applications.
+
+All header pages MUST have their granule position field set to zero.
+%TODO: or -1?
+%TBT: What are we doing now?
+
+\subsubsection{Frame data}
+
+The first frame data packet in a logical bitstream MUST begin a fresh page.
+All other data packets are placed one at a time into Ogg pages
+ until the end of the stream.
+Packets can span pages and multiple packets can be placed within any
+ one page.
+The last page in the logical bitstream MUST have its `end of stream'
+ flag set.
+
+Frame data pages MUST be marked with a granule index corresponding to
+ the display time of the last frame/packet that finishes in that page.
+
+{\bf Note:}
+This scheme is still under discussion.
+It has also been proposed that pages be labeled with a granule corresponding to
+ the first frame that begins on that page.
+This simplifies seeking and mux, but is different from the published
+ definition of the Ogg granule field.
+This document will be updated when the issue is settled.
+
+%TODO: \subsubsection{Granule position}
+
+\subsection{Multiplexed stream mapping}
+
+Applications supporting Ogg Theora I must support Theora bitstreams
+ multiplexed with compressed audio data in the Vorbis I and Speex
+ formats, and should support Ogg-encapsulated MNG graphics for overlays.
+% and the Writ format for text-based titling.
+%TBT: That's great... do these things have specifications?
+
+Multiple audio and video bitstreams may be multiplexed together.
+How playback of multiple/alternate streams is handled is up to the
+ application.
+Some conventions based on included metadata aide interoperability
+ in this respect.
+%TODO: describe multiple vs. alternate streams, language mapping
+% and reference metadata descriptions.
+
+\subsubsection{Chained streams}
+
+Ogg Theora decoders and playback applications MUST support both grouped
+ streams (multiplexed concurrent logical streams) and chained streams
+ (sequential concatenation of independent physical bitstreams).
+
+The number and codec data types of multiplexed streams and the decoder
+ parameters for those stream types that re-occur can all change at a
+ chaining boundary.
+A playback application MUST be prepared to handle such changes and
+ SHOULD do so smoothly with the minimum possible visible disruption.
+The specification of grouped streams below applies independently to each
+ segment of a chained bitstream.
+
+\subsubsection{Grouped streams}
+
+At the beginning of a multiplexed stream, the `beginning of stream'
+ pages for each logical bitstream will be grouped together.
+Within these, the first page to occur MUST be the Theora page.
+This facilitates identification of Ogg Theora files among other
+ Ogg-encapsulated content.
+A playback application must nevertheless handle streams where this
+ arrangement is not correct.
+%TBT: Then what's the point of requiring it in the spec?
+
+If there is more than one Theora logical stream, the first page should
+ be from the primary stream.
+That is, the best choice for the stream a generic player should begin
+ displaying without special user direction.
+If there is more than one audio stream, or of any other stream
+ type, the identification page of the primary stream of that type
+ should be placed before the others.
+%TBT: That's all pretty vague.
+
+After the `beginning of stream' pages, the header pages of each of
+ the logical streams should be grouped together before any data pages
+ occur.
+%TBT: should or must?
+
+After all the header pages have been placed,
+ the data pages are multiplexed together.
+They should be placed in the stream in increasing order by the playback
+ time equivalents of their granule fields.
+This facilitates seeking while limiting the buffering requirements of the
+ playback demultiplexer.
+%TODO: A lot of this language is encoder-oriented.
+%TODO: We define a decoder-oriented specification.
+%TODO: The language should be changed to match.
+
+\section{Colophon}
+
+%TODO: Logo
+
+Ogg is a \href{http://www.xiph.org}{Xiph.org Foundation} effort to protect
+ essential tenets of Internet multimedia from corporate hostage-taking; Open
+ Source is the net's greatest tool to keep everyone honest.
+See \href{http://www.xiph.org/about.html}{About the Xiph.org Foundation} for
+ details.
+
+Ogg Theora is the first Ogg video codec.
+Anyone may freely use and distribute the Ogg and Theora specification, whether
+ in private, public, or corporate capacity.
+However, the Xiph.org Foundation and the Ogg project reserve the right to set
+ the Ogg Theora specification and certify specification compliance.
+
+Xiph.org's Theora software codec implementation is distributed under a BSD-like
+ license.
+This does not restrict third parties from distributing independent
+ implementations of Theora software under other licenses.
+
+Ogg, Theora, Vorbis, Xiph.org Foundation and their logos are trademarks (tm) of
+ the \href{http://www.xiph.org}{Xiph.org Foundation}.
+These pages are copyright \copyright{} 2004 Xiph.org Foundation.
+All rights reserved.
+
+This document is set in \LaTeX.
+
+\bibliography{spec}
+
+\end{document}
Deleted: theora/trunk/doc/spec.bib
===================================================================
--- theora/trunk/doc/spec.bib 2004-04-01 00:51:39 UTC (rev 6482)
+++ theora/trunk/doc/spec.bib 2004-04-01 06:17:38 UTC (rev 6483)
@@ -1,54 +0,0 @@
- at MISC{Mel04,
- author="Mike Melanson",
- title="{VP3} Bitstream Format and Decoding Process",
- howpublished="\url{http://home.pcisys.net/~melanson/codecs/vp3-format.txt}",
- month="Mar.",
- year=2004
-}
-
- at MISC{vorbis,
- author="{Xiph.org Foundation}",
- title="{Vorbis~I} specification",
- howpublished="\url{http://www.xiph.org/ogg/vorbis/doc/}",
- year=2002
-}
-
- at MISC{oggstream,
- author="Christopher Montgomery",
- title="{Ogg} logical and physical bitstream overview",
- howpublished="\url{http://www.xiph.org/ogg/doc/oggstream.html}",
- month="Jul.",
- year=2002
-}
-
- at MISC{oggframe,
- author="Christopher Montgomery",
- title="{Ogg} logical bitstream framing",
- howpublished="\url{http://www.xiph.org/ogg/doc/framing.html}",
- month="Jul.",
- year=2002
-}
-
- at MISC{rfc3533,
- author="Silvia Pfeiffer",
- title="{RFC} 3533: The {Ogg} Encapsulation Format Version 0",
- howpublished="\url{http://www.ietf.org/rfc/rfc3533.txt}",
- month="May",
- year=2003
-}
-
- at MISC{rfc3534,
- author="Linus Walleij",
- title="The {application/ogg} Media Type",
- howpublished="\url{http://www.ietf.org/rfc/rfc3534.txt}",
- month="May",
- year=2003
-}
-
- at MISC{rfc3550,
- author="H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson",
- title="RTP: A Transport Protocol for Real-Time Applications",
- howpublished="\url{http://www.ietf.org/rfc/rfc3550.txt}",
- month="Jul.",
- year=2003
-}
Deleted: theora/trunk/doc/spec.tex
===================================================================
--- theora/trunk/doc/spec.tex 2004-04-01 00:51:39 UTC (rev 6482)
+++ theora/trunk/doc/spec.tex 2004-04-01 06:17:38 UTC (rev 6483)
@@ -1,1059 +0,0 @@
-\documentclass[11pt,letterpaper]{article}
-
-\usepackage{latexsym}
-\usepackage{amssymb}
-\usepackage{amsmath}
-\usepackage{graphicx}
-\usepackage[pdfpagemode=None,pdfstartview=FitH,pdfview=FitH,colorlinks=true]%
- {hyperref}
-
-\newtheorem{theorem}{Theorem}[section]
-\newcommand{\qi}{\ensuremath{\mathit{qi}}}
-\newcommand{\ti}{\ensuremath{\mathit{ti}}}
-\newcommand{\term}[1]{{\em #1}}
-
-\pagestyle{headings}
-\bibliographystyle{alpha}
-
-\title{Theora I Specification}
-\author{Xiph.org Foundation}
-\date{\today}
-
-\begin{document}
-
-\maketitle
-\tableofcontents
-\newpage
-
-\section{Introduction and Description}
-
-This section provides a high level description of the Theora codec's
- construction.
-A bit-by-bit specification appears beginning in Section~\ref{sec:bitpacking}.
-The later sections assume a high-level understanding of the Theora decode
- process, which is provided below.
-
-\subsection{Overview}
-
-Theora is a general purpose, lossy video codec.
-It is based on the VP3 video codec produced by On2 Technologies
- (\url{http://www.on2.com/}).
-On2 Technologies donated the VP3.2 source code to the Xiph.org
- Foundation and it was released under a BSD license.
-On2 also made an irrevocable, royalty-free license grant for any patent claims
- it might have over the software and any derivatives.
-No formal specification exists for the VP3 format beyond this source code,
- though Mike Melanson maintains a detailed description \cite{Mel04}.
-Portions of this specification were adopted from his text with permission.
-
-\subsubsection{VP3 and Theora}
-
-Theora contains a superset of the features that were available in the original
- VP3 codec.
-Content encoded with VP3.2 can be losslessly transcoded into the Theora format.
-%TODO: what about VP3.1 etc? source tables all say 'VP31'
-Theora content cannot, in general, be losslessly transcoded into the VP3
- format.
-If a feature is not available in the original VP3 format, this is mentioned
- when that feature is defined.
-A complete list of these features appears in
- Appendix~\ref{app:oggencapsulation}.
-
-\subsubsection{Video Formats}
-
-Theora I currently supports progressive video data of arbitrary dimensions in
- one of several $Y'C_bC_r$ color spaces.
-The precise definition the color spaces supported appears in Section~REF.
-Three different chroma subsampling formats are supported: 4:2:0, 4:2:2,
- and 4:4:4.
-The precise details of each of these formats and their sampling locations are
- described in Section~REF.
-
-The Theora I format does not support interlaced material, bit-depths larger
- than 8 bits per component, nor alternate color spaces such as RGB or
- arbitrary multi-channel spaces.
-Black and white content can be efficiently encoded, however, because the
- uniform chroma planes compress well.
-Support for interlaced material is planned for a future version.
-Support for increased bit depths or additional color spaces is not planned.
-
-\subsubsection{Classification}
-
-Theora I is a block-based lossy transform codec that utilizes an
- $8\times 8$ Type-II Discrete Cosine Transform and block-based motion
- compensation.
-This places it in the same class of codecs as MPEG-1, -2, -4, and H.263.
-The details of how individual blocks are organized and how DCT coefficients are
- organized in the bitstream differ substantially from these codecs, however.
-Theora supports only intra frames (I frames in MPEG) and inter frames (P frames
- in MPEG).
-There is no equivalent to the bi-predictive frames (B frames) found in MPEG
- codecs.
-
-\subsubsection{Assumptions}
-
-The Theora codec design assumes a complex, psychovisually-aware encoder and a
- simple, low-complexity decoder.
-%TODO: Talk more about implementation complexity.
-
-Theora provides none of its own framing, synchronization, or protection against
- transmission errors; it is solely a method of accepting input video frames and
- compressing these frames into raw, unformatted `packets'.
-The decoder then accepts these raw packets in sequence, decodes them, and
- synthesizes a fascimile of the original video frames.
-Theora is a free-form variable bit rate (VBR) codec, and packets have no
- minimum size, maximum size, or fixed/expected size.
-
-Theora packets are thus intended to be used with a transport mechanism that
- provides free-form framing, synchronization, positioning, and error correction
- in accordance with these design assumptions, such as Ogg (for file transport)
- or RTP (for network multicast).
-For the purposes of a few examples in this document, we will assume that Theora
- is embedded in an Ogg stream specifically, although this is by no means a
- requirement or fundamental assumption in the Theora design.
-
-The specification for embedding Theora into an Ogg transport stream is given in
- Appendix~REF.
-
-\subsubsection{Codec Setup and Probability Model}
-
-Theora's heritage is the proprietary commerical codec VP3, and it retains a
- fair amount of inflexibility when compared to Vorbis \cite{vorbis}, the first
- Xiph.org codec, which began as a research codec.
-However, to provide additional scope for encoder improvement, Theora adopts
- some of the configurable aspects of decoder setup that are present in Vorbis.
-This configuration data is not available in VP3, which used hardcoded values
- instead.
-
-Theora makes the same controversial design decision that Vorbis made to include
- the entire probability model for the DCT coefficients and all the quantization
- parameters in the bitstream headers.
-This is often several hundred fields.
-This makes it impossible to begin decoding at any frame in the stream without
- having previously fetched the codec info and codec setup headers.
-
-\begin{verse}
-{\bf Note:} Theora {\em can} initiate decode at an arbitrary intra-frame packet
- within a bitstream so long as the codec has been initialized with the setup
- headers.
-\end{verse}
-
-Thus, Theora headers are both required for decode to begin and relatively large
- as bitstream headers go.
-The header size is unbounded, although as a rule-of-thumb less than 16kB is
- recommended, and Xiph.org's reference encoder follows this suggestion.
-%TODO: Is 8kB enough? My setup header is 7.4kB, that doesn't leave much room
-% for comments.
-%RG: the lesson from vorbis is that as small as possible is really
-% important in some applications. Practically, what's acceptable
-% depends a great deal on the target bitrate. I'd leave 16 kB in the
-% spec for now. fwiw more than 1k of comments is quite unusual.
-
-Our own design work indicates that the primary liability of the required header
- is in mindshare; it is an unusual design and thus causes some amount of
- complaint among engineers as this runs against current design trends and
- points out limitations in some existing software/interface designs.
-However, we find that it does not fundamentally limit Theora's suitable
- application space.
-
-\subsubsection{Format Specification}
-
-The Theora format is well-defined by its decode specification; any encoder that
- produces packets that are correctly decoded by an implementation following
- this specification may be considered a proper Theora encoder.
-A decoder must faithfully and completely implement the specification defined
- herein %, except where noted,
- to be considered a proper Theora decoder.
-Where appropriate, a non-normative description of encoder processes is
- included.
-These sections will be marked as such, and a proper Theora encoder is not
- bound to follow them.
-
-%TODO: \subsubsection{Hardware Profile}
-
-\subsection{Decoder Configuration}
-
-Decoder setup consists of configuration of the quantization matrices and the
- Huffman codebooks for the DCT coefficients.
-The remainder of the decoding pipeline is not configurable.
-
-\subsubsection{Global Configuration}
-
-The global codec configuration consists of a few video related fields, such as
- frame rate, frame size, picture size and offset, aspect ratio, color space,
- pixel format, and a version number.
-The version number is divided into a major version, a minor version, amd a
- minor revision number.
-For the format defined in this specification, these are `3', `2', and
- `0', respectively, in reference to Theora's origin as a successor to the VP3.2
- format.
-
-\subsubsection{Quantization Matrices}
-
-Theora allows up to 384 different quantization matrices to be defined, one for
- each \term{quantization type} (intra or inter), \term{color plane}
- ($Y'$, $C_b$, or $C_r$), and \term{quantization index}, \qi, which ranges from
- zero to 63, inclusive.
-The quantization index generally represents a progressive range of quality
- levels, from low quality near zero to high quality near 63.
-However, the interpretation is arbitrary, and it is possible, for example, to
- partition the scale into two completely separate ranges with 32 levels each
- that are meant to represent different classes of source material.
-
-Each quantization matrix is an $8\times 8$ matrix of 16-bit values, which is
- used to quantize the output of the $8\times 8$ DCT.
-Quantization matrices are specified using three components: a
- \term{base matrix} and two \term{scale values}.
-The first scale value is the \term{DC scale}, which is applied to the DC
- component of the base matrix.
-The second scale value is the \term{AC scale}, which is applied to all the
- other components of the base matrix.
-
-There are 64 DC scale values and 64 AC scale values, one for each \qi value.
-There is a set of base matrices for each quantization type and each color
- plane.
-The bitstream specifies this set by defining a base matrix for a sparse subset
- of the possible \qi values, including at least zero and 63.
-The base matrices for the remainder of the \qi values are computed using linear
- interpolation.
-This configuration allows the quantization matrices to approximate the complex,
- non-linear processes of the human visual system as the \qi value varies.
-
-Finally, because the in-loop deblocking filter strength depends on the strength
- of the quantization matrices defined in this header, a table of 64 \term{loop
- filter limit values} is defined, one for each \qi value.
-
-The precise specification of how all of this information is decoded appears in
- Section~REF.
-
-\subsubsection{Huffman Codebooks}
-
-Theora uses 80 configurable binary Huffman codes to represent the 32 tokens
- used to encode DCT coefficients.
-Each of the 32 token values has a different semantic meaning and is used to
- represent single coefficient values, zero runs, combinations of the two, and
- \term{End-Of-Block} markers.
-
-The 80 codes are divided up into five groups of 16, with each group
- corresponding to a set of DCT coefficient indices.
-The first group corresponds to the DC coefficient, while the remaining groups
- correspond to different subsets of the AC coefficients.
-Within each frame, two pairs of 4-bit codebook indices are stored.
-The first pair selects which codebooks to use from the DC coefficient group for
- the $Y'$ coefficients and the $C_b$ and $C_r$ coefficients.
-The second pair selects which codebooks to use from {\em all} of the AC
- coefficient groups for the $Y'$ coefficients and the $C_b$ and $C_r$
- coefficients.
-
-The precise specification of how the codebooks are decoded appears in
- Section~REF.
-
-\subsection{Coded Video Structure}
-
-Theora is based on $8\times 8$ blocks of pixels.
-This sections describes how a video frame is laid out, divided into blocks, and
- how those blocks are organized.
-
-\subsubsection{Frame Layout}
-
-A video frame in Theora is a two-dimensional array of pixels.
-Theora, like VP3, uses a right-handed coordinate system, with the origin in the
- lower-left corner of the frame.
-This is contrary to many video formats which use a left-handed coordinate
- system with the origin in the upper-left corner of the frame.
-%INT: This means that for interlaced material, the definition of ``even fields"
-%INT: and ``odd fields" may be reversed between Theora and other video codecs.
-%INT: This document will always refer to them as ``top fields" and ``bottom
-%INT: fields".
-
-Theora divides the pixel array up into three separate \term{color planes}, one
- for each of the $Y'$, $C_b$, and $C_r$ components of the pixel.
-The $Y'$ plane is also called the \term{luma plane}, and the $C_b$ and $C_r$
- planes are also called the \term{chroma planes}.
-In some pixel formats, the chroma planes are decimated by two in one or both
- directions.
-This means that the width or height of the chroma planes may be half that of
- the total frame width and height, and thus only a multiple of eight, not
- sixteen.
-The luma plane is never decimated.
-
-\subsubsection{Picture Region}
-
-A video frame in Theora is required to have a width and height that are
- multiples of sixteen.
-However, inside a frame a smaller \term{picture region} may be defined.
-The picture region can be offset from the lower-left corner of the frame by up
- to 255 pixels in each direction, and may have an arbitrary width and height,
- provided that it is contained entirely within the coded frame.
-It is this picture region that contains the actual video data.
-The portions of the frame which lie outside the picture region may contain
- arbitrary data, and should be cropped away after decode.
-The picture region plays no other role in the decode process, which operates on
- the entire video frame.
-
-\subsubsection{Blocks and Super Blocks}
-
-Each color plane is subdivided into $8\times 8$ \term{blocks}.
-Blocks are grouped into $4\times 4$ arrays called \term{super blocks}.
-Each color plane has its own set of blocks and super blocks.
-The boundaries of the luma plane are not necessarily aligned with those of the
- chroma planes, if the chroma planes have been decimated.
-
-Blocks are accessed in two different orders in the various decoder processes.
-The first is \term{raster order}.
-This indexes each block in row-major order, starting in the lower left and
- proceeding along the bottom row, followed by the next row up starting on the
- left, etc.
-The second is \term{coded order}.
-In coded order, blocks are accessed by super block.
-Each super block is traversed in raster order, similar to raster order for
- blocks.
-Within each super block, however, blocks are accessed in a Hilbert curve
- pattern, illustrated in Figure~REF.
-If a color plane does not contain a complete super block on the top or right
- sides, the same ordering is still used, simply with any blocks outside the
- frame boundary ommitted.
-
-%TODO: Figure
-% X -> X X -> X
-% | ^
-% v |
-% X <- X X <- X
-% | ^
-% v |
-% X X -> X X
-% | ^ | ^
-% v | v |
-% X -> X X -> X
-%But upside down.
-
-To illustrate these two orderings, consider a frame that is 240 pixels wide and
- 48 pixels high.
-Thus each row of the luma plane has 30 blocks, 8 super blocks, and there are 6
- rows of blocks and one row of super blocks.
-
-When accessed in raster order, each block in the luma plane is assigned the
- following indices:
-
-\vspace{\baselineskip}
-\begin{tabular}{|l|l|l|l|c|l|l|}\hline
-150 & 151 & 152 & 153 & $\ldots$ & 178 & 179 \\\hline
-120 & 121 & 122 & 123 & $\ldots$ & 148 & 149 \\\hline
- 90 & 91 & 92 & 93 & $\ldots$ & 118 & 119 \\\hline
- 60 & 61 & 62 & 63 & $\ldots$ & 88 & 89 \\\hline
- 30 & 31 & 32 & 33 & $\ldots$ & 58 & 59 \\\hline
- 0 & 1 & 2 & 3 & $\ldots$ & 28 & 29 \\\hline
-\end{tabular}
-\vspace{\baselineskip}
-
-When accessed in coded order, each block in the luma plane is assigned the
- following indices:
-
-\vspace{\baselineskip}
-\begin{tabular}{|l|l|l|l|c|l|l|l|l|}\hline
-123 & 122 & 125 & 124 & $\ldots$ & 179 & 178 \\\hline
-120 & 121 & 126 & 127 & $\ldots$ & 176 & 177 \\\hline
- 5 & 6 & 9 & 10 & $\ldots$ & 117 & 118 \\\hline
- 4 & 7 & 8 & 11 & $\ldots$ & 116 & 119 \\\hline
- 3 & 2 & 13 & 12 & $\ldots$ & 115 & 114 \\\hline
- 0 & 1 & 14 & 15 & $\ldots$ & 112 & 113 \\\hline
-\end{tabular}
-\vspace{\baselineskip}
-
-Blocks in the chroma planes immediately follow those of the luma plane without
- a break.
-
-\subsubsection{Macro Blocks}
-
-A macro block contains a $2\times 2$ array of blocks in the luma plane
- {\em and} the co-located blocks in the chroma planes.
-Thus macro blocks can represent anywhere from six to twelve blocks, depending
- on how the chroma planes are decimated.
-Macro blocks contain information about coding mode and motion vectors for the
- corresponding blocks in all color planes.
-
-Macro blocks are also accessed in a \term{coded order}.
-This coded order proceeds be examining each super block in the luma plane in
- raster order, and traversing the four macro blocks inside using a smaller
- Hilbert curve, as shown in Figure~REF.
-If the luma plane does not contain a complete super block on the top or right
- sides, the same ordering is still used, simply with any macro blocks outside
- the frame boundary omitted.
-Because the frame size is constrained to be a multiple of 16, there are never
- any partial macro blocks.
-Unlike blocks, macro blocks need never be accessed in a pure raster order.
-
-%TODO: Figure
-% X -> X
-% ^ |
-% | v
-% X X
-
-Using the same frame size as the example above, there are 15 macro blocks in
- each row and 3 rows of macro blocks.
-They are assigned the following indices:
-
-\vspace{\baselineskip}
-\begin{tabular}{|l|l|c|l|}\hline
-30 & 31 & $\cdots$ & 44 \\\hline
- 1 & 2 & $\cdots$ & 29 \\\hline
- 0 & 3 & $\cdots$ & 28 \\\hline
-\end{tabular}
-\vspace{\baselineskip}
-
-\subsubsection{Coding Modes}
-
-Each block is coded using one of a small, fixed set of \term{coding modes} that
- define how their contents are predicted.
-The INTRA mode uses no inter-frame prediction, and is the only mode allowed in
- intra frames.
-The other coding modes use the contents of one of two different \term{reference
- frames}.
-A reference frame is the fully decoded version of a previous frame in the
- stream.
-The first available reference frame is the previous frame, whether it was an
- intra frame or an inter frame.
-The second available reference frame is the previous intra frame, called the
- \term{golden frame}.
-The most important inter coding mode is INTER\_NOMV, which uses the co-located
- contents of the block in the previous frame as the predictor with no
- motion-compensated prediction.
-
-\subsection{High-Level Decode Process}
-
-\subsubsection{Decoder Setup}
-
-Before decoding can begin, a decoder must be initialized using the bitstream
- headers corresponding to the stream to be decoded.
-Theora uses three header packets; all are required, in order, by this
- specification.
-Once set up, decode may begin at any intra-frame packet---or even inter-frame
- packets, provided the appropriate decoded reference frames have been
- cached---belonging to the Theora stream.
-In Theora I, all packets after the three initial headers are intra-frame or
- inter-frame packets.
-
-The header packets are, in order, the identification header, the comment
- header, and the setup header.
-
-\paragraph{Identification Header}
-
-The identification header identifies the stream as Theora, provides a version
- number, and defines the characteristics of the video stream such as frame
- size.
-A complete description of the identification header appears in Section~REF.
-
-\paragraph{Comment Header}
-
-The comment header includes user text comments (``tags") and a vendor string
- for the application/library that produced the stream.
-The format of the comment header is the same as that used in the Vorbis I and
- Speex codecs, with slight modifications due to the use of a different bit
- packing mechanism.
-A complete description of how the comment header is coded appears in
- Section~REF, along with a suggested set of tags.
-
-\paragraph{Setup Header}
-
-The setup header includes extensive codec setup information, including the
- complete set of quantization matrices and Huffman codebooks needed to decode
- the DCT coefficients.
-
-\subsubsection{Decode Procedure}
-
-The decoding and synthesis procedure for all video packets is fundamentally the
- same, with some steps omitted for intra frames.
-\begin{enumerate}
-\item
-Decode packet type flag.
-\item
-Decode frame header.
-\item
-Decode coded block information (inter frames only).
-\item
-Decode macro block mode information (inter frames only).
-\item
-Decode motion vectors (inter frames only).
-\item
-Decode block-level \qi information.
-\item
-Decode DC coefficient for each coded block.
-\item
-Decode 1st AC coefficient for each coded block.
-\item
-Decode 2nd AC coefficient for each coded block.
-\item
-$\ldots$
-\item
-Decode 63rd AC coefficient for each coded block.
-\item Perform DC coefficient prediction.
-\item Reconstruct coded blocks.
-\item Copy uncoded bocks.
-\item Perform loop filtering.
-\end{enumerate}
-
-Note that clever rearrangement of the steps in this process is possible.
-As an example, in a memory-constrained environment, one can make multiple
- passes through the DCT coefficients to avoid buffering them all in memory.
-On the first pass, the starting location of each coefficient is identified, and
- then 64 separate get pointers are used to read in the 64 DCT coefficients
- required to reconstruct each coded block in sequence.
-This operation produces entirely equivalent output and is naturally perfectly
- legal.
-It may even be a benefit in non-memory-constrained environments due to a
- reduced cache footprint.
-The decoder must be {\em entirely mathematically equivalent} to the
- specification; it need not be a literal semantic implementation.
-
-Theora makes equivalence easy to check by defining all decoding operations in
- terms of exact integer operations.
-No floating-point math is required, and in particular, the implementation of
- the iDCT transform must be followed precisely.
-This prevents the decoder mismatch problem commonly associated with codecs that
- provide a less rigorous transform specification.
-Such a mismatch problem would be devastating to Theora, since a single rounding
- error in one frame could propagate throughout the entire succeeding frame due
- to DC prediction.
-
-\paragraph{Packet Type Decode}
-
-Theora I uses four packet types.
-The first three packet types mark each of the three Theora headers described
- above.
-The fourth packet type marks a video packet.
-All other packet types are reserved; packets marked with a reserved type should
- be ignored.
-
-\paragraph{Frame Header Decode}
-
-The frame header contains some global information about the current frame.
-The first is the frame type field, which specifies if this is an intra frame or
- an inter frame.
-Inter frames predict their contents from previously decoded reference frames.
-Intra frames can be independently decoded with no established reference frames.
-
-The next piece of information in the frame header is the list of \qi values
- allowed in the frame.
-Theora allows between one and three different \qi values to be used in a single
- frame, each of which selects a set of six quantization matrices, one for each
- quantization type (inter or intra), and one for each color plane.
-The first \qi value is {\em always} used when dequantizing DC coefficients.
-The \qi value used when dequantizing AC coefficients, however, can vary from
- block to block.
-VP3, in contrast, allowed just a single \qi value per frame for both the DC and
- AC coefficients.
-
-\paragraph{Coded Block Information}
-
-This stage determines which blocks in the frame are coded and which are
- uncoded.
-A \term{coded block list} is constructed which lists all the coded blocks in
- coded order.
-For intra frames, every block is coded, and so no data needs to be read from
- the packet.
-
-\paragraph{Macro Block Mode Information}
-
-For intra frames, every block is coded in INTRA mode, and this stage can be
- skipped.
-In inter frames a \term{coded macro block list} is constructed from the coded
- block list.
-Any macro block which has at least one of its luma blocks coded is considered
- coded; all other macro blocks are uncoded, even if they contain coded chroma
- blocks.
-A coding mode is decoded for each coded macro block, and assigned to all its
- constituent coded blocks.
-All coded chroma blocks in uncoded macro blocks are assigned the INTER\_NOMV
- coding mode.
-
-\paragraph{Motion Vectors}
-
-Intra frames are all coded entirely in INTRA mode, and so this stage can be
- skipped.
-Some inter coding modes, however, require one or more motion vectors to be
- specified for each macro block.
-These are decoded in this stage, and an appropriate motion vector is assigned
- to each coded block in the macro block.
-
-\paragraph{Block-Level \qi Information}
-
-If a frame allows multiple \qi values, the \qi value assigned to each block is
- decoded here.
-Frames that use only a single \qi value have nothing to decode.
-
-\paragraph{DCT Coefficients}
-
-Finally, the quantized DCT coefficients are decoded.
-DCT coefficients are represented by a list of tokens.
-Each token can take on one of 32 different values, each with a different
- semantic meaning.
-A single token can represent a single DCT coefficient, a run of zero
- coefficients within a single block, a combination of a run of zero
- coefficients followed by a single non-zero coefficient, an
- \term{End-Of-Block} marker, or a run of EOB markers.
-EOB markers signify that the remainder of the block is one long zero run.
-Unlike JPEG and MPEG, each block is not required to end with a special marker.
-If non-EOB tokens yield values for all 64 of the coefficients in a block, then
- no EOB marker is needed.
-
-Each token is associated with a specific \term{token index} in a block.
-For single-coefficient tokens, this index is the index of the token in the
- block.
-For zero-run tokens, this index is the index of the {\em first} coefficient in
- the run.
-For combination tokens, the index is again the index of the first coefficient
- in the zero run.
-For EOB markers, which signify that the remainder of the block is one long zero
- run, the index is the first zero coefficient in that run.
-For EOB runs, the token index is that of the first EOB marker in the run.
-Due to zero runs and EOB markers, a block does not have to have a token for
- every token index.
-
-Tokens are grouped in the stream by token index, not by the block they
- originate from.
-This means that for each token index in turn, the tokens with that index from
- {\em all} the coded blocks are coded in coded block order.
-When decoding, a current token index is maintained for each coded block.
-This index is advanced by the number of coefficients that are added to the
- block as each token is decoded.
-After fully decoding all the tokens with token index \ti, the current token
- index of every coded block will be \ti or greater.
-
-If an EOB run of $n$ blocks is decoded at token index \ti, then it ends the
- next $n$ blocks in coded block order whose current token index is equal to
- \ti, but not greater.
-If there are fewer than $n$ blocks with a current token index of \ti, then the
- decoder goes through the coded block list again from the start, ending blocks
- with a current token index of $\ti+1$, and so on, until $n$ blocks have been
- ended or the current token index of every block is 64.
-
-Tokens are read by parsing a Huffman code that depends on \ti and the color
- plane of the next coded block whose current token index is equal to \ti, but
- not greater.
-The Huffman codebooks are selected on a per-frame basis from the 80 codebooks
- defined in the setup header.
-Many tokens have a fixed number of \term{extra bits} associated with them.
-These bits are read directly after the token is decoded.
-These are used to define things such as coefficient magnitude, sign, and the
- length of runs.
-
-\paragraph{DC Prediction}
-
-After the coefficients for each block are decoded, the quantized DC value of
- each block is adjusted based on the DC values of its neighbors.
-This adjustment is performed by scanning the blocks in raster order, not coded
- order.
-
-\paragraph{Reconstruction}
-
-Finally, using the coding mode, motion vector (if applicable), quantized
- coefficient list, and \qi value defined for each block, all the coded blocks
- are reconstructed.
-The DCT coefficients are dequantized, an inverse DCT transform is applied, and
- a predictor is formed from the coding mode and motion vector and added to the
- result.
-
-\paragraph{Loop Filtering}
-
-To complete the reconstructed frame, an in-loop deblocking filter is applied to
- the edges of all coded blocks.
-
-\section{Bitpacking Convention}
-\label{sec:bitpacking}
-
-\subsection{Overview}
-
-The Theora codec uses relatively unstructured raw packets containing
- binary integer fields of arbitrary width.
-Logically, each packet is a bitstream in which bits are written one-by-one by
- the encoder and then read one-by-one in the same order by the decoder.
-Most current binary storage arrangements group bits into a native storage unit
- of eight bits (octets), sixteen bits, thirty-two bits, or less commonly other
- fixed sizes.
-The Theora bitpacking convention specifies the correct mapping of the logical
- packet bitstream into an actual representation in fixed-width units.
-
-\subsubsection{Octets and Bytes}
-
-In most contemporary architectures, a `byte' is synonymous with an `octect',
- that is, eight bits.
-For purposes of the bitpacking convention, a byte implies the smallest native
- integer storage representation offered by a platform.
-Modern file systems invariably offer bytes as the fundamental atom of storage.
-
-The most ubiquitous architectures today consider a `byte' to be an octet.
-Note, however, that the Theora bitpacking convention is still well defined for
- any native byte size;
-an implementation can use the native bit-width of a given storage
-system.
-This document assumes that a byte is one octet for purposes of example only.
-
-\subsubsection{Words and Byte Order}
-
-A `word' is an integer size that is a grouped multiple of the byte size.
-Most architectures consider a word to be a group of two, four, or eight bytes.
-Each byte in the word can be ranked by order of `significance', e.g. the
- significance of the bits in each byte when storing a binary integer in the
- word.
-Several byte orderings are possible in a word.
-The common ones are
-\begin{itemize}
-\item{Big-endian:}
-in which the most significant byte comes first, e.g. 3-2-1-0,
-\item{Little-endian:}
-in which the least significant byte comes first, e.g. 0-1-2-3, and less
- commonly
-\item{Mixed-endian:}
-e.g. 3-1-2-0 or 0-2-1-3.
-\end{itemize}
-
-The Theora bitpacking convention specifies storage and bitstream manipulation
- at the byte, not word, level.
-Thus host word ordering is of a concern only during optimization, when writing
- code that operates on a word of storage at a time rather than a byte.
-Logically, bytes are always encoded and decoded in order from byte zero through
- byte $n$.
-
-\subsubsection{Bit Order}
-
-A byte has a well-defined `least significant' bit (LSb), which is the only bit
- set when the byte is storing the two's complement integer value $+1$.
-A byte's `most significant' bit (MSb) is at the opposite end of the byte.
-Bits in a byte are numbered from zero at the LSb to $n$ for the MSb, where
- $n=7$ in an octet.
-
-\subsection{Coding Bits into Bytes}
-
-The Theora codec needs to encode arbitrary bit-width integers from zero to 32
- bits wide into packets.
-These integer fields are not aligned to the boundaries of the byte
- representation; the next field is read at the bit position immediately
- after the end of the previous field.
-
-The decoder logically unpacks integers by first reading the MSb of a binary
- integer from the logical bitstream, followed by the next most significant
- bit, etc., until the required number of bits have been read.
-When unpacking the bytes into bits, the decoder begins by reading the MSb of
- the integer to be read from the most significant unread bit position of the
- source byte, followed by the next-most significant bit position of the
- destination integer, and so on up to the requested number of bits.
-Note that this differs from the Vorbis I codec, which
- begins decoding with the LSb of the source integer, reading it from the
- LSb of the source byte.
-When all the bits of the current source byte are read, decoding continues with
- the MSb of the next byte.
-Any unfilled bits in the last byte of the packet must be cleared to zero by the
- encoder.
-
-\subsubsection{Signedness}
-
-The binary integers decoded by the above process may be either signed or
- unsigned.
-This varies from integer to integer, and this specification
- indicates how each value should be interpreted as it is read.
-That is, depending on context, the three bit binary pattern `b111' can be taken
- to represent either `$7$' as an unsigned integer or `$-1$' as a signed, two's
- complement integer.
-
-\subsubsection{Encoding Example}
-
-The following example shows the state of an (8-bit) byte stream after several
- binary integers are encoded, including the location of the put pointer for the
- next bit to write to and the total length of the stream in bytes.
-
-Encode the 4 bit unsigned integer value `12' (b1100) into an empty byte stream.
-
-\begin{tabular}{rccccccccl}
- & & & & &$\downarrow$&&&& \\
- \hfill\vline& 7 & 6 & 5 & 4 & 3 & 2 & 1 & 0 & \\\cline{1-9}
-byte 0 \hfill\vline& 1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 &$\leftarrow$\\
-byte 1 \hfill\vline& 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \\
-byte 2 \hfill\vline& 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \\
-byte 3 \hfill\vline& 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \\
-\hfill$\vdots$\hfill\vline&\multicolumn{8}{c}{$\vdots$}& \\
-byte $n$\hfill\vline& 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 &
-byte stream length: 1 byte
-\end{tabular}
-\vspace{\baselineskip}
-
-Continue by encoding the 3 bit signed integer value `-1' (b111).
-
-\begin{tabular}{rccccccccl}
- & & & & & & & &$\downarrow$& \\
- \hfill\vline& 7 & 6 & 5 & 4 & 3 & 2 & 1 & 0 & \\\cline{1-9}
-byte 0 \hfill\vline& 1 & 1 & 0 & 0 & 1 & 1 & 1 & 0 &$\leftarrow$\\
-byte 1 \hfill\vline& 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \\
-byte 2 \hfill\vline& 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \\
-byte 3 \hfill\vline& 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \\
-\hfill$\vdots$\hfill\vline&\multicolumn{8}{c}{$\vdots$}& \\
-byte $n$\hfill\vline& 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 &
-byte stream length: 1 byte
-\end{tabular}
-\vspace{\baselineskip}
-
-Continue by encoding the 7 bit integer value `17' (b0010001).
-
-\begin{tabular}{rccccccccl}
- & & & & & & &$\downarrow$&& \\
- \hfill\vline& 7 & 6 & 5 & 4 & 3 & 2 & 1 & 0 & \\\cline{1-9}
-byte 0 \hfill\vline& 1 & 1 & 0 & 0 & 1 & 1 & 1 & 0 & \\
-byte 1 \hfill\vline& 0 & 1 & 0 & 0 & 0 & 1 & 0 & 0 &$\leftarrow$\\
-byte 2 \hfill\vline& 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \\
-byte 3 \hfill\vline& 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \\
-\hfill$\vdots$\hfill\vline&\multicolumn{8}{c}{$\vdots$}& \\
-byte $n$\hfill\vline& 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 &
-byte stream length: 2 bytes
-\end{tabular}
-\vspace{\baselineskip}
-
-Continue by encoding the 13 bit integer value `6969' (b11011 00111001).
-
-\begin{tabular}{rccccccccl}
- & & & &$\downarrow$&&&& & \\
- \hfill\vline& 7 & 6 & 5 & 4 & 3 & 2 & 1 & 0 & \\\cline{1-9}
-byte 0 \hfill\vline& 1 & 1 & 0 & 0 & 1 & 1 & 1 & 0 & \\
-byte 1 \hfill\vline& 0 & 1 & 0 & 0 & 0 & 1 & 1 & 1 & \\
-byte 2 \hfill\vline& 0 & 1 & 1 & 0 & 0 & 1 & 1 & 1 & \\
-byte 3 \hfill\vline& 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 &$\leftarrow$\\
-\hfill$\vdots$\hfill\vline&\multicolumn{8}{c}{$\vdots$}& \\
-byte $n$\hfill\vline& 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 &
-byte stream length: 4 bytes
-\end{tabular}
-\vspace{\baselineskip}
-
-\subsubsection{Decoding Example}
-
-The following example shows the state of the (8-bit) byte stream encoded in the
- previous example after several binary integers are decoded, including the
- location of the get pointer for the next bit to read.
-
-Read a two bit unsigned integer from the example encoded above.
-
-\begin{tabular}{rccccccccl}
- & & &$\downarrow$&&&& & & \\
- \hfill\vline& 7 & 6 & 5 & 4 & 3 & 2 & 1 & 0 & \\\cline{1-9}
-byte 0 \hfill\vline& 1 & 1 & 0 & 0 & 1 & 1 & 1 & 0 &$\leftarrow$\\
-byte 1 \hfill\vline& 0 & 1 & 0 & 0 & 0 & 1 & 1 & 1 & \\
-byte 2 \hfill\vline& 0 & 1 & 1 & 0 & 0 & 1 & 1 & 1 & \\
-byte 3 \hfill\vline& 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 &
-byte stream length: 4 bytes
-\end{tabular}
-\vspace{\baselineskip}
-
-Value read: 3 (b11).
-
-Read another two bit unsigned integer from the example encoded above.
-
-\begin{tabular}{rccccccccl}
- & & & & &$\downarrow$&&& & \\
- \hfill\vline& 7 & 6 & 5 & 4 & 3 & 2 & 1 & 0 & \\\cline{1-9}
-byte 0 \hfill\vline& 1 & 1 & 0 & 0 & 1 & 1 & 1 & 0 &$\leftarrow$\\
-byte 1 \hfill\vline& 0 & 1 & 0 & 0 & 0 & 1 & 1 & 1 & \\
-byte 2 \hfill\vline& 0 & 1 & 1 & 0 & 0 & 1 & 1 & 1 & \\
-byte 3 \hfill\vline& 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 &
-byte stream length: 4 bytes
-\end{tabular}
-\vspace{\baselineskip}
-
-Value read: 0 (b00).
-
-Two things are worth noting here.
-\begin{itemize}
-\item
-Although these four bits were originally written as a single four-bit integer,
- reading some other combination of bit-widths from the bitstream is well
- defined.
-No artificial alignment boundaries are maintained in the bitstream.
-\item
-The first value is the integer `$3$' only because the context stated we were
- reading an unsigned integer.
-Had the context stated we were reading a signed integer, the returned value
- would have been the integer `$-1$'.
-\end{itemize}
-
-\subsubsection{End-of-Packet Alignment}
-
-The typical use of bitpacking is to produce many independent byte-aligned
- packets which are embedded into a larger byte-aligned container structure,
- such as an Ogg transport bitstream.
-Externally, each bitstream encoded as a byte stream must begin and end on a
- byte boundary.
-Often, the encoded bitstream is not an integer number of bytes, and so there is
- unused space in the last byte of a packet.
-
-Unused space in the last byte of a packet is always zeroed during the encoding
- process.
-Thus, should this unused space be read, it will return binary zeroes.
-There is no marker pattern or stuffing bits that will allow the decoder to
- obtain the exact size, in bits, of the original bitstream.
-This knowledge is not required for decoding.
-
-Attempting to read past the end of an encoded packet results in an
- `end-of-packet' condition.
-Any further read operations after an `end-of-packet' condition shall also
- return `end-of-packet'.
-Unlike Vorbis, Theora does not use truncated packets as a normal mode of
- operation.
-Therefore if a decoder encounters the `end-of-packet' condition during normal
- decoding, it may attempt to use the bits that were read to recover as much of
- encoded data as possible, signal a warning or error, or both.
-
-\subsubsection{Reading Zero Bit Integers}
-
-Reading a zero bit integer returns the value `$0$' and does not increment
- the stream pointer.
-Reading to the end of the packet, but not past the end, so that an
- `end-of-packet' condition is not triggered, and then reading a zero bit
- integer shall succeed, returning `$0$', and not trigger an `end-of-packet'
- condition.
-Reading a zero bit integer after a previous read sets the `end-of-packet'
- condition shall fail, also returning `end-of-packet'.
-
-\appendix
-
-\section{Ogg Bitstream Encapsulation}
-\label{app:oggencapsulation}
-
-\subsection{Overview}
-
-This document specifies the embedding or encapsulation of Theora packets
- in an Ogg transport stream.
-
-Ogg is a stream oriented wrapper for coded, linear time-based data.
-It provides syncronization, multiplexing, framing, error detection and
- seeking landmarks for the decoder and complements the raw packet format
- used by the Theora codec.
-
-This document assumes familiarity with the details of the Ogg standard.
-An overview of the Ogg transport stream format is given in Xiph.org
- documentation \cite{oggstream} and a detailed description is also given in
- this documentation \cite{oggframe} and in RFC~3533 \cite{rfc3533}.
-While Theora packets can be embedded in a wide variety of media
- containers and streaming mechanisms, the Xiph.org Foundation
- recommends Ogg as the native format for Theora video in file-oriented
- storage and transmission contexts.
-
-\subsubsection{MIME type}
-
-The correct MIME type of any Ogg file is {\tt application/ogg}.
-Outside of an encapsulation, the mime type {\tt video/x-theora} may
- be used to refer specifically to the Theora compressed video stream.
-
-\subsection{Embedding in a logical bitstream}
-
-Ogg separates a {\em logical bitstream} consisting of the framing of
- a particular sequence of packets and complete within itself from
- the {\em physical bitstream} which may consist either of a single
- logical bitstream or a number of logical bitstreams multiplexed
- together.
-This section specifies the embedding of Theora packets in a logical Ogg
- bitstream.
-The mapping of Ogg Theora logical bitstreams into a multiplexed physical Ogg
- stream is described in the next section.
-
-\subsubsection{Headers}
-
-The initial info header packet appears by itself in a single Ogg page.
-This page defines the start of the logical stream and must have
- the `beginning of stream' flag set.
-
-The second and third header packets (metadata comments and decoder
- setup data) can together span one or more Ogg pages.
-If there are additional non-normative header packets, they must be
- included in this sequence of pages as well.
-The comment header packet must begin the second Ogg page in the logical
- bitstream, and there must be a page break between the last header
- packet and the first frame data packet.
-
-These two page break requirements facilitate stream identification and
- simplify header acquisition for seeking and live streaming applications.
-
-All header pages must have their granule position field set to zero.
-%TODO: or -1?
-%TBT: What are we doing now?
-
-\subsubsection{Frame data}
-
-The first frame data packet in a logical bitstream must begin a fresh page.
-All other data packets are placed one at a time into Ogg pages
- until the end of the stream.
-Packets can span pages and multiple packets can be placed within any
- one page.
-The last page in the logical bitstream must have its `end of stream'
- flag set.
-
-Frame data pages must be marked with a granule index corresponding to
- the display time of the last frame/packet that finishes in that page.
-
-{\bf Note:}
-This scheme is still under discussion.
-It has also been proposed that pages be labeled with a granule corresponding to
- the first frame that begins on that page.
-This simplifies seeking and mux, but is different from the published
- definition of the Ogg granule field.
-This document will be updated when the issue is settled.
-
-%TODO: \subsubsection{Granule position}
-
-\subsection{Multiplexed stream mapping}
-
-Applications supporting Ogg Theora I must support Theora bitstreams
- multiplexed with compressed audio data in the Vorbis I and Speex
- formats, and should support Ogg-encapsulated MNG graphics for overlays.
-% and the Writ format for text-based titling.
-%TBT: That's great... do these things have specifications?
-
-Multiple audio and video bitstreams may be multiplexed together.
-How playback of multiple/alternate streams is handled is up to the
- application.
-Some conventions based on included metadata aide interoperability
- in this respect.
-%TODO: describe multiple vs. alternate streams, language mapping
-% and reference metadata descriptions.
-
-\subsubsection{Chained streams}
-
-Ogg Theora decoders and playback applications must support both grouped
- streams (multiplexed concurrent logical streams) and chained streams
- (sequential concatenation of independent physical bitstreams).
-
-The number and codec data types of multiplexed streams and the decoder
- parameters for those stream types that re-occur can all change at a
- chaining boundary.
-A playback application must be prepared to handle such changes and
- should do so smoothly with the minimum possible visible disruption.
-The specification of grouped streams below applies independently to each
- segment of a chained bitstream.
-
-\subsubsection{Grouped streams}
-
-At the beginning of a multiplexed stream, the `beginning of stream'
- pages for each logical bitstream will be grouped together.
-Within these, the first page to occur must be the Theora page.
-This facilitates identification of Ogg Theora files among other
- Ogg-encapsulated content.
-A playback application must nevertheless handle streams where this
- arrangement is not correct.
-
-If there is more than one Theora logical stream, the first page should
- be from the primary stream.
-That is, the best choice for the stream a generic player should begin
- displaying without special user direction.
-If there is more than one audio stream, or of any other stream
- type, the identification page of the primary stream of that type
- must be placed before the others.
-
-After the `beginning of stream' pages, the header pages of each of
- the logical streams should be grouped together before any data pages
- occur.
-
-After all the header pages have been placed,
- the data pages are multiplexed together.
-They should be placed in the stream in increasing order by the playback
- time equivalents of their granule fields.
-This facilitates seeking while limiting the buffering requirements of the
- playback demultiplexer.
-
-\bibliography{spec}
-
-\end{document}
--- >8 ----
List archives: http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'cvs-request at xiph.org'
containing only the word 'unsubscribe' in the body. No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.
More information about the commits
mailing list