<a href="http://elphel.cvs.sourceforge.net/elphel/camera333/fpga/x333/">http://elphel.cvs.sourceforge.net/elphel/camera333/fpga/x333/</a> - you may find _working_ (for Theora) IDCT implementation in Verilog for Xilinx Spartan 3 there.

<br><br>Andrey<br><br><div><span class="gmail_quote">On 5/31/06, <b class="gmail_sendername">Felipe Portavales Goldstein</b> &lt;<a href="mailto:portavales@gmail.com">portavales@gmail.com</a>&gt; wrote:</span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

YEAAAAAAAAHHHHH<br><br>IDCT_SLOW VHDL model is working<br>but I neet optimize it to consume less FPGA resources like multiplyers.<br><br>i will send to svn this night<br><br><br>On 5/31/06, Felipe Portavales Goldstein &lt;

<a href="mailto:portavales@gmail.com">portavales@gmail.com</a>&gt; wrote: &gt; On 5/31/06, Timothy B. Terriberry &lt;<a href="mailto:tterribe@vt.edu">tterribe@vt.edu</a>&gt; wrote: &gt; &gt; Remembering to CC: the list this time.

<br>&gt;<br>&gt; :-)<br>&gt; my mistake<br>&gt;<br>&gt; &gt;<br>&gt; &gt; Felipe Portavales Goldstein wrote:<br>&gt; &gt; &gt; On 5/31/06, Timothy B. Terriberry &lt;<a href="mailto:tterribe@vt.edu">tterribe@vt.edu</a>&gt; wrote:

<br>&gt; &gt; &gt;<br>&gt; &gt; &gt;&gt; Felipe Portavales Goldstein wrote:<br>&gt; &gt; &gt;&gt; &gt; My question is:<br>&gt; &gt; &gt;&gt; &gt;<br>&gt; &gt; &gt;&gt; &gt; The result of (_Gd + _Cd)&nbsp;&nbsp;can be a number with more than 16 bits ?

<br>&gt; &gt; &gt;&gt; &gt; (yes, it can be because they are int32, but the algorithm could<br>&gt; &gt; &gt;&gt; &gt; guarantee something about that... I dont know...)<br>&gt; &gt; &gt;&gt;<br>&gt; &gt; &gt;&gt; With normal input, certainly this would never occur. However, due to

<br>&gt; &gt; &gt;&gt; quantization error, rounding error, etc., it is theoretically possible<br>&gt; &gt; &gt;&gt; to generate a number with more than 16 bits here.<br>&gt; &gt; &gt;<br>&gt; &gt; &gt;<br>&gt; &gt; &gt; Good :-)

<br>&gt; &gt; &gt;<br>&gt; &gt; &gt;&gt;<br>&gt; &gt; &gt;&gt; &gt; If can, the cast (ogg_int16_t) will truncate the number to the 16 less<br>&gt; &gt; &gt;&gt; &gt; significant bits, and will get a wrong result...<br>&gt; &gt; &gt;&gt; &gt;

<br>&gt; &gt; &gt;&gt; &gt; the ip[0] is 32 bits, so, why truncate to 16 bits ?<br>&gt; &gt; &gt;&gt;<br>&gt; &gt; &gt;&gt; The main answer is, &quot;To make SIMD/hardware implementations easier.&quot;<br>&gt; &gt; &gt;&gt; These will generally use 16-bit registers, and so will automatically

<br>&gt; &gt; &gt;&gt; have done the truncation.<br>&gt; &gt; &gt;<br>&gt; &gt; &gt;<br>&gt; &gt; &gt; Your right, Its better to use 16-bit registers. And using 16-bit<br>&gt; &gt; &gt; adders and multipliers we can get shorters critical-paths , having a

<br>&gt; &gt; &gt; higher clock rate.<br>&gt; &gt; &gt;<br>&gt; &gt; &gt; Then, I have other question:<br>&gt; &gt; &gt;<br>&gt; &gt; &gt; If the result is truncated to 16 bits, why the IntermediateData was<br>&gt; &gt; &gt; declared as 32 bits ?

<br>&gt; &gt; &gt;<br>&gt; &gt; &gt;&nbsp;&nbsp;ogg_int32_t IntermediateData[64];<br>&gt; &gt; &gt;&nbsp;&nbsp;ogg_int32_t * ip = IntermediateData;<br>&gt; &gt; &gt;<br>&gt; &gt; &gt; I think this is because the dequant_slow result is 32 bits, and is

<br>&gt; &gt; &gt; stored in the IntermediateData<br>&gt; &gt; &gt;<br>&gt; &gt; &gt; But, this dequant result is multiplied by a 16 bit defined cossine<br>&gt; &gt; &gt; factor , and this new result is shifted right 16 bits and stored in

<br>&gt; &gt; &gt; IntermediateData<br>&gt; &gt; &gt;<br>&gt; &gt; &gt; Im thinking If I could use 16 bits IntermediateData array.<br>&gt; &gt; &gt;<br>&gt; &gt; &gt; The dequant especification says:<br>&gt; &gt; &gt; Output parameters:

<br>&gt; &gt; &gt; DQC - integer array - size = 14 bits<br>&gt; &gt; &gt;<br>&gt; &gt; &gt; I think that I can use the InteremediateData as 16 bits integer.<br>&gt; &gt; &gt; What do you think ?<br>&gt; &gt;<br>&gt; &gt; Yes, you certainly can. On modern 32-bit CPUs, 16-bit instructions are

<br>&gt; &gt; very, very slow, so we avoid them when we can. The only real reason to<br>&gt; &gt; use 16-bit operands on a 32-bit CPU is to save memory bandwidth, which<br>&gt; &gt; is the primary bottleneck in video processing. Since IntermediateData is

<br>&gt; &gt; local, and likely to be entirely in cache, there's no reason to make it<br>&gt; &gt; 16 bits.<br>&gt; &gt;<br>&gt; &gt; If you are implementing the iDCT for a different instruction<br>&gt; &gt; set/architecture, I highly suggest working from Section 

7.9.3 of the<br>&gt; &gt; spec directly. The spec can be obtained from:<br>&gt; &gt; <a href="http://www.theora.org/doc/Theora_I_spec.pdf">http://www.theora.org/doc/Theora_I_spec.pdf</a><br>&gt;<br>&gt; I'm working on a theora decoder on FPGA. I'm writing directly the

&gt; hardware in VHDL. &gt; &gt; I'm preparing to put the VHDL files on the SVN and post in this list a &gt; description of this work as soon as possible. &gt; &gt; Yes, I'm reading the spec. &gt; But sometimes the libtheora software can help.

<br>&gt;<br>&gt;<br>&gt; &gt;<br>&gt; &gt; &gt;&gt; The important thing is not that the iDCT gives you valid values that<br>&gt; &gt; &gt;&gt; make sense in such situations, but that it gives you the _same_ values<br>&gt; &gt; &gt;&gt; across all implementations, even when the input is invalid. If that were

<br>&gt; &gt; &gt;&gt; not the case, then the decoded frame would not be the same as what the<br>&gt; &gt; &gt;&gt; encoder _thought_ the decoded frame was going to be, and so the next<br>&gt; &gt; &gt;&gt; subsequent frame would also be wrong, etc., all the way until the next

<br>&gt; &gt; &gt;&gt; keyframe.<br>&gt; &gt; &gt;&gt;<br>&gt; &gt; &gt;&gt; Think of it this way: you can never generate a _wrong_ result so long as<br>&gt; &gt; &gt;&gt; you follow the specification. The specification tells you what result

<br>&gt; &gt; &gt;&gt; you're going to get for any input. If the encoder chose an input that<br>&gt; &gt; &gt;&gt; caused overflow, well, that's the encoder's problem, not the decoder's.<br>&gt; &gt; &gt;&gt;<br>&gt; &gt; &gt;&gt; &gt; But I'm realy confused with the &gt;&gt; 0 ,

<br>&gt; &gt; &gt;&gt; &gt; This shift right zero can do something or someone just forgot to delete<br>&gt; &gt; &gt;&gt; &gt; it ?<br>&gt; &gt; &gt;&gt;<br>&gt; &gt; &gt;&gt; I assume the original author was playing around with dividing up the &gt;&gt;4

<br>&gt; &gt; &gt;&gt; in the op[] stage between the two. It doesn't matter; any compiler worth<br>&gt; &gt; &gt;&gt; its salt will optimize the useless operation away.<br>&gt; &gt;<br>&gt;<br>&gt;<br>&gt; --<br>&gt; ________________________________________

<br>&gt; Felipe Portavales &lt;<a href="mailto:portavales@gmail.com">portavales@gmail.com</a>&gt;<br>&gt; Undergraduate Student - IC-UNICAMP<br>&gt; Computer Systems Laboratory<br>&gt; <a href="http://www.lsc.ic.unicamp.br">

http://www.lsc.ic.unicamp.br</a><br>&gt;<br><br><br>--<br>________________________________________<br>Felipe Portavales &lt;<a href="mailto:portavales@gmail.com">portavales@gmail.com</a>&gt;<br>Undergraduate Student - IC-UNICAMP

<br>Computer Systems Laboratory<br><a href="http://www.lsc.ic.unicamp.br">http://www.lsc.ic.unicamp.br</a><br>_______________________________________________<br>Theora-dev mailing list<br><a href="mailto:Theora-dev@xiph.org">

Theora-dev@xiph.org</a><br><a href="http://lists.xiph.org/mailman/listinfo/theora-dev">http://lists.xiph.org/mailman/listinfo/theora-dev</a><br></blockquote></div><br>