[Theora-dev] Changing the IDCT spec
Timothy B. Terriberry
tterribe at email.unc.edu
Fri Feb 11 11:28:33 PST 2005
So, in preparation for some decoder optimization work planned by Rudolf
Marek, the subject of the size of the registers needed in the IDCT
The current spec language ensures that the result is exactly compatible
with the C code for VP3. This language requires that some of the
arguments to the multiplies be 17 or 18 bits, because they need to hold
the sum or difference of two 16-bit numbers. This necessitates using
32-bit registers, which greatly reduces potential parallelism for SIMD
instructions (not to mention making an implementation much more
complicated on embedded chipsets with 16-bit registers).
However, upon reviewing VP3's own MMX routines, I discovered that they
used 16-bit registers anyway. Thus, in VP3 where the code IS the spec,
the code still doesn't match the spec.
Now, I want to emphasize, in practical terms, the differences have very
little real effect. Given normal pixel values, the resulting DCT
coefficients should not even come close to overflowing the registers
during the IDCT (there are about 3 bits to spare). Even with some pretty
severe quantization errors, that seems to be enough headroom.
However, the specification does not specify the encoder's operation, it
specify the decoder's. It is possible to store coefficients in the
bitstream that would cause overflow, and we need to standardize what to
do in such cases. When I wrote the section in the spec, I took the
approach of "do what the code does, no matter how much it hurts
optimization", but knowing now that the code does two different things,
we have a choice.
However, the spec has now been included in an official release (alpha4),
and I know several people have begun or completed independent
implementations (e.g., Andrey Filippov's FPGA encoder, Robert
Brautigam's Java port, for sure, and I remember some talk of a DSP stamp
between either Aaron Colwell and the Fluendo folks). So I don't want
make such a significant change to the language of the spec without
soliciting input from the people it will affect.
So, to summarize, there are two choices:
1) Truncate the result of each intermediate step in the IDCT to 16 bits,
providing for better SIMD and 16-bit architecture optimization, but
requiring slightly more work in a 32-bit C implementation, or
2) Keep the current language, allowing some intermediate results to grow
to 17 or 18 bits, requiring 32-bit registers.
Either choice should have no real effect on content encoded with any of
the encoders I am aware of. Both are equally compatible with existing
VP3 content, as different VP3 codepaths follow both approaches. If
anything, the first approach is probably used more often since most PCs
from the last 9 years have had some kind of MMX support.
More information about the Theora-dev