[Theora-dev] Changing the IDCT spec
acolwell at real.com
Wed Feb 16 09:30:59 PST 2005
Option 1 sounds good to me. I haven't implemented the IDCT for Blackfin yet,
but I'm sure that this will help with that effort.
On Fri, Feb 11, 2005 at 02:28:33PM -0500, Timothy B. Terriberry wrote:
> So, in preparation for some decoder optimization work planned by Rudolf
> Marek, the subject of the size of the registers needed in the IDCT
> came up.
> The current spec language ensures that the result is exactly compatible
> with the C code for VP3. This language requires that some of the
> arguments to the multiplies be 17 or 18 bits, because they need to hold
> the sum or difference of two 16-bit numbers. This necessitates using
> 32-bit registers, which greatly reduces potential parallelism for SIMD
> instructions (not to mention making an implementation much more
> complicated on embedded chipsets with 16-bit registers).
> However, upon reviewing VP3's own MMX routines, I discovered that they
> used 16-bit registers anyway. Thus, in VP3 where the code IS the spec,
> the code still doesn't match the spec.
> Now, I want to emphasize, in practical terms, the differences have very
> little real effect. Given normal pixel values, the resulting DCT
> coefficients should not even come close to overflowing the registers
> during the IDCT (there are about 3 bits to spare). Even with some pretty
> severe quantization errors, that seems to be enough headroom.
> However, the specification does not specify the encoder's operation, it
> specify the decoder's. It is possible to store coefficients in the
> bitstream that would cause overflow, and we need to standardize what to
> do in such cases. When I wrote the section in the spec, I took the
> approach of "do what the code does, no matter how much it hurts
> optimization", but knowing now that the code does two different things,
> we have a choice.
> However, the spec has now been included in an official release (alpha4),
> and I know several people have begun or completed independent
> implementations (e.g., Andrey Filippov's FPGA encoder, Robert
> Brautigam's Java port, for sure, and I remember some talk of a DSP stamp
> between either Aaron Colwell and the Fluendo folks). So I don't want
> make such a significant change to the language of the spec without
> soliciting input from the people it will affect.
> So, to summarize, there are two choices:
> 1) Truncate the result of each intermediate step in the IDCT to 16 bits,
> providing for better SIMD and 16-bit architecture optimization, but
> requiring slightly more work in a 32-bit C implementation, or
> 2) Keep the current language, allowing some intermediate results to grow
> to 17 or 18 bits, requiring 32-bit registers.
> Either choice should have no real effect on content encoded with any of
> the encoders I am aware of. Both are equally compatible with existing
> VP3 content, as different VP3 codepaths follow both approaches. If
> anything, the first approach is probably used more often since most PCs
> from the last 9 years have had some kind of MMX support.
> Thoughts? Opinions?
> Theora-dev mailing list
> Theora-dev at xiph.org
More information about the Theora-dev