[Theora-dev] 16 bits, cast on idct function

Wed May 31 12:31:21 PDT 2006

http://elphel.cvs.sourceforge.net/elphel/camera333/fpga/x333/ - you may find
_working_ (for Theora) IDCT implementation in Verilog for Xilinx Spartan 3
there.

Andrey

On 5/31/06, Felipe Portavales Goldstein <portavales at gmail.com> wrote:
>
> YEAAAAAAAAHHHHH
>
> IDCT_SLOW VHDL model is working
> but I neet optimize it to consume less FPGA resources like multiplyers.
>
> i will send to svn this night
>
>
> On 5/31/06, Felipe Portavales Goldstein <portavales at gmail.com> wrote:
> > On 5/31/06, Timothy B. Terriberry <tterribe at vt.edu> wrote:
> > > Remembering to CC: the list this time.
> >
> > :-)
> > my mistake
> >
> > >
> > > Felipe Portavales Goldstein wrote:
> > > > On 5/31/06, Timothy B. Terriberry <tterribe at vt.edu> wrote:
> > > >
> > > >> Felipe Portavales Goldstein wrote:
> > > >> > My question is:
> > > >> >
> > > >> > The result of (_Gd + _Cd)  can be a number with more than 16 bits
> ?
> > > >> > (yes, it can be because they are int32, but the algorithm could
> > > >> > guarantee something about that... I dont know...)
> > > >>
> > > >> With normal input, certainly this would never occur. However, due
> to
> > > >> quantization error, rounding error, etc., it is theoretically
> possible
> > > >> to generate a number with more than 16 bits here.
> > > >
> > > >
> > > > Good :-)
> > > >
> > > >>
> > > >> > If can, the cast (ogg_int16_t) will truncate the number to the 16
> less
> > > >> > significant bits, and will get a wrong result...
> > > >> >
> > > >> > the ip[0] is 32 bits, so, why truncate to 16 bits ?
> > > >>
> > > >> The main answer is, "To make SIMD/hardware implementations easier."
> > > >> These will generally use 16-bit registers, and so will
> automatically
> > > >> have done the truncation.
> > > >
> > > >
> > > > Your right, Its better to use 16-bit registers. And using 16-bit
> > > > adders and multipliers we can get shorters critical-paths , having a
> > > > higher clock rate.
> > > >
> > > > Then, I have other question:
> > > >
> > > > If the result is truncated to 16 bits, why the IntermediateData was
> > > > declared as 32 bits ?
> > > >
> > > >  ogg_int32_t IntermediateData[64];
> > > >  ogg_int32_t * ip = IntermediateData;
> > > >
> > > > I think this is because the dequant_slow result is 32 bits, and is
> > > > stored in the IntermediateData
> > > >
> > > > But, this dequant result is multiplied by a 16 bit defined cossine
> > > > factor , and this new result is shifted right 16 bits and stored in
> > > > IntermediateData
> > > >
> > > > Im thinking If I could use 16 bits IntermediateData array.
> > > >
> > > > The dequant especification says:
> > > > Output parameters:
> > > > DQC - integer array - size = 14 bits
> > > >
> > > > I think that I can use the InteremediateData as 16 bits integer.
> > > > What do you think ?
> > >
> > > Yes, you certainly can. On modern 32-bit CPUs, 16-bit instructions are
> > > very, very slow, so we avoid them when we can. The only real reason to
> > > use 16-bit operands on a 32-bit CPU is to save memory bandwidth, which
> > > is the primary bottleneck in video processing. Since IntermediateData
> is
> > > local, and likely to be entirely in cache, there's no reason to make
> it
> > > 16 bits.
> > >
> > > If you are implementing the iDCT for a different instruction
> > > set/architecture, I highly suggest working from Section 7.9.3 of the
> > > spec directly. The spec can be obtained from:
> > > http://www.theora.org/doc/Theora_I_spec.pdf
> >
> > I'm working on a theora decoder on FPGA. I'm writing directly the
> > hardware in VHDL.
> >
> > I'm preparing to put the VHDL files on the SVN and post in this list a
> > description of this work as soon as possible.
> >
> > Yes, I'm reading the spec.
> > But sometimes the libtheora software can help.
> >
> >
> > >
> > > >> The important thing is not that the iDCT gives you valid values
> that
> > > >> make sense in such situations, but that it gives you the _same_
> values
> > > >> across all implementations, even when the input is invalid. If that
> were
> > > >> not the case, then the decoded frame would not be the same as what
> the
> > > >> encoder _thought_ the decoded frame was going to be, and so the
> next
> > > >> subsequent frame would also be wrong, etc., all the way until the
> next
> > > >> keyframe.
> > > >>
> > > >> Think of it this way: you can never generate a _wrong_ result so
> long as
> > > >> you follow the specification. The specification tells you what
> result
> > > >> you're going to get for any input. If the encoder chose an input
> that
> > > >> caused overflow, well, that's the encoder's problem, not the
> decoder's.
> > > >>
> > > >> > But I'm realy confused with the >> 0 ,
> > > >> > This shift right zero can do something or someone just forgot to
> delete
> > > >> > it ?
> > > >>
> > > >> I assume the original author was playing around with dividing up
> the >>4
> > > >> in the op[] stage between the two. It doesn't matter; any compiler
> worth
> > > >> its salt will optimize the useless operation away.
> > >
> >
> >
> > --
> > ________________________________________
> > Felipe Portavales <portavales at gmail.com>
> > Undergraduate Student - IC-UNICAMP
> > Computer Systems Laboratory
> > http://www.lsc.ic.unicamp.br
> >
>
>
> --
> ________________________________________
> Felipe Portavales <portavales at gmail.com>
> Undergraduate Student - IC-UNICAMP
> Computer Systems Laboratory
> http://www.lsc.ic.unicamp.br
> _______________________________________________
> Theora-dev mailing list
> Theora-dev at xiph.org
> http://lists.xiph.org/mailman/listinfo/theora-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.xiph.org/pipermail/theora-dev/attachments/20060531/5ff9a3a9/attachment.htm