[Theora-dev] 16 bits, cast on idct function

Wed May 31 12:26:50 PDT 2006

YEAAAAAAAAHHHHH

IDCT_SLOW VHDL model is working
but I neet optimize it to consume less FPGA resources like multiplyers.

i will send to svn this night

On 5/31/06, Felipe Portavales Goldstein <portavales at gmail.com> wrote:
> On 5/31/06, Timothy B. Terriberry <tterribe at vt.edu> wrote:
> > Remembering to CC: the list this time.
>
> :-)
> my mistake
>
> >
> > Felipe Portavales Goldstein wrote:
> > > On 5/31/06, Timothy B. Terriberry <tterribe at vt.edu> wrote:
> > >
> > >> Felipe Portavales Goldstein wrote:
> > >> > My question is:
> > >> >
> > >> > The result of (_Gd + _Cd)  can be a number with more than 16 bits ?
> > >> > (yes, it can be because they are int32, but the algorithm could
> > >> > guarantee something about that... I dont know...)
> > >>
> > >> With normal input, certainly this would never occur. However, due to
> > >> quantization error, rounding error, etc., it is theoretically possible
> > >> to generate a number with more than 16 bits here.
> > >
> > >
> > > Good :-)
> > >
> > >>
> > >> > If can, the cast (ogg_int16_t) will truncate the number to the 16 less
> > >> > significant bits, and will get a wrong result...
> > >> >
> > >> > the ip[0] is 32 bits, so, why truncate to 16 bits ?
> > >>
> > >> The main answer is, "To make SIMD/hardware implementations easier."
> > >> These will generally use 16-bit registers, and so will automatically
> > >> have done the truncation.
> > >
> > >
> > > Your right, Its better to use 16-bit registers. And using 16-bit
> > > adders and multipliers we can get shorters critical-paths , having a
> > > higher clock rate.
> > >
> > > Then, I have other question:
> > >
> > > If the result is truncated to 16 bits, why the IntermediateData was
> > > declared as 32 bits ?
> > >
> > >  ogg_int32_t IntermediateData[64];
> > >  ogg_int32_t * ip = IntermediateData;
> > >
> > > I think this is because the dequant_slow result is 32 bits, and is
> > > stored in the IntermediateData
> > >
> > > But, this dequant result is multiplied by a 16 bit defined cossine
> > > factor , and this new result is shifted right 16 bits and stored in
> > > IntermediateData
> > >
> > > Im thinking If I could use 16 bits IntermediateData array.
> > >
> > > The dequant especification says:
> > > Output parameters:
> > > DQC - integer array - size = 14 bits
> > >
> > > I think that I can use the InteremediateData as 16 bits integer.
> > > What do you think ?
> >
> > Yes, you certainly can. On modern 32-bit CPUs, 16-bit instructions are
> > very, very slow, so we avoid them when we can. The only real reason to
> > use 16-bit operands on a 32-bit CPU is to save memory bandwidth, which
> > is the primary bottleneck in video processing. Since IntermediateData is
> > local, and likely to be entirely in cache, there's no reason to make it
> > 16 bits.
> >
> > If you are implementing the iDCT for a different instruction
> > set/architecture, I highly suggest working from Section 7.9.3 of the
> > spec directly. The spec can be obtained from:
> > http://www.theora.org/doc/Theora_I_spec.pdf
>
> I'm working on a theora decoder on FPGA. I'm writing directly the
> hardware in VHDL.
>
> I'm preparing to put the VHDL files on the SVN and post in this list a
> description of this work as soon as possible.
>
> Yes, I'm reading the spec.
> But sometimes the libtheora software can help.
>
>
> >
> > >> The important thing is not that the iDCT gives you valid values that
> > >> make sense in such situations, but that it gives you the _same_ values
> > >> across all implementations, even when the input is invalid. If that were
> > >> not the case, then the decoded frame would not be the same as what the
> > >> encoder _thought_ the decoded frame was going to be, and so the next
> > >> subsequent frame would also be wrong, etc., all the way until the next
> > >> keyframe.
> > >>
> > >> Think of it this way: you can never generate a _wrong_ result so long as
> > >> you follow the specification. The specification tells you what result
> > >> you're going to get for any input. If the encoder chose an input that
> > >> caused overflow, well, that's the encoder's problem, not the decoder's.
> > >>
> > >> > But I'm realy confused with the >> 0 ,
> > >> > This shift right zero can do something or someone just forgot to delete
> > >> > it ?
> > >>
> > >> I assume the original author was playing around with dividing up the >>4
> > >> in the op[] stage between the two. It doesn't matter; any compiler worth
> > >> its salt will optimize the useless operation away.
> >
>
>
> --
> ________________________________________
> Felipe Portavales <portavales at gmail.com>
> Undergraduate Student - IC-UNICAMP
> Computer Systems Laboratory
> http://www.lsc.ic.unicamp.br
>

-- 
________________________________________
Felipe Portavales <portavales at gmail.com>
Undergraduate Student - IC-UNICAMP
Computer Systems Laboratory
http://www.lsc.ic.unicamp.br