[Theora-dev] Changing the IDCT spec

Wed Feb 16 09:30:59 PST 2005

Option 1 sounds good to me. I haven't implemented the IDCT for Blackfin yet,
but I'm sure that this will help with that effort.

Aaron

On Fri, Feb 11, 2005 at 02:28:33PM -0500, Timothy B. Terriberry wrote:
> So, in preparation for some decoder optimization work planned by Rudolf 
>  Marek, the subject of the size of the registers needed in the IDCT 
> came up.
> 
> The current spec language ensures that the result is exactly compatible 
> with the C code for VP3. This language requires that some of the 
> arguments to the multiplies be 17 or 18 bits, because they need to hold 
> the sum or difference of two 16-bit numbers. This necessitates using 
> 32-bit registers, which greatly reduces potential parallelism for SIMD 
> instructions (not to mention making an implementation much more 
> complicated on embedded chipsets with 16-bit registers).
> 
> However, upon reviewing VP3's own MMX routines, I discovered that they 
> used 16-bit registers anyway. Thus, in VP3 where the code IS the spec, 
> the code still doesn't match the spec.
> 
> Now, I want to emphasize, in practical terms, the differences have very 
> little real effect. Given normal pixel values, the resulting DCT 
> coefficients should not even come close to overflowing the registers 
> during the IDCT (there are about 3 bits to spare). Even with some pretty 
> severe quantization errors, that seems to be enough headroom.
> 
> However, the specification does not specify the encoder's operation, it 
> specify the decoder's. It is possible to store coefficients in the 
> bitstream that would cause overflow, and we need to standardize what to 
> do in such cases. When I wrote the section in the spec, I took the 
> approach of "do what the code does, no matter how much it hurts 
> optimization", but knowing now that the code does two different things, 
> we have a choice.
> 
> However, the spec has now been included in an official release (alpha4), 
> and I know several people have begun or completed independent 
> implementations (e.g., Andrey Filippov's FPGA encoder, Robert 
> Brautigam's Java port, for sure, and I remember some talk of a DSP stamp 
> between either Aaron Colwell and the Fluendo folks). So I don't want 
> make such a significant change to the language of the spec without 
> soliciting input from the people it will affect.
> 
> So, to summarize, there are two choices:
> 1) Truncate the result of each intermediate step in the IDCT to 16 bits, 
> providing for better SIMD and 16-bit architecture optimization, but 
> requiring slightly more work in a 32-bit C implementation, or
> 2) Keep the current language, allowing some intermediate results to grow 
> to 17 or 18 bits, requiring 32-bit registers.
> 
> Either choice should have no real effect on content encoded with any of 
> the encoders I am aware of. Both are equally compatible with existing 
> VP3 content, as different VP3 codepaths follow both approaches. If 
> anything, the first approach is probably used more often since most PCs 
> from the last 9 years have had some kind of MMX support.
> 
> Thoughts? Opinions?
> _______________________________________________
> Theora-dev mailing list
> Theora-dev at xiph.org
> http://lists.xiph.org/mailman/listinfo/theora-dev
>