[theora-dev] GSoC - Theora multithread decoder

Sun Jul 6 17:43:32 PDT 2008

This week I will work with the pipeline and by the end of this week I will
send a report.

On Sun, Jul 6, 2008 at 9:39 PM, Leonardo de Paula Rosa Piga <
lpiga at terra.com.br> wrote:

> Hi all,
>
> I apologize to not keep you up to date to what is going on with my project.
> Portavales has worked in a desk behind me and when we go to take coffee we
> talk about the project. Second I didn't know we have to discuss weekly, it
> was my fault. I should have read the rules. Sorry.
>
> At the first month, I studied the code and the Theora Beta implementation.
> The code is completely different from Alpha and I have to be familiarized
> with the code.
> After that I started doing tests with OpenMP.
>
> One first test was 40% faster, but unfortunately it did not decode the
> frame correctly, three quarters was green.
>
> I have one implementation decoding the Y, Cb and Cr planes in parallel. The
> OpenMP implementation was about 5% faster. Not worthless, since it does not
> require any great modifications.
>
> I looked at Ralph's implementation and merged it to the current. The speed
> up was about 10% but the code have to be modified in many places.
>
> Extract parallelism from the current implementation is very difficult.
> Coarse grain functions are the best functions to be parallelize to become
> the overhead worthwhile, but the current implementation has one, at most
> two. The parts that I suggested in my initial plan are fine grain functions,
> they spend a lot of cpu time but they are called too many times. The time
> spent to create and synchronize threads is greater than the speed up gains.
> We need functions that are called a few times and spend many cpu time. Also
> data dependency should be the lowest as possible.
>
> According to the model that i did (
> http://lampiao.lsc.ic.unicamp.br/~piga/gsoc_2008/implementation.pdf<http://lampiao.lsc.ic.unicamp.br/%7Epiga/gsoc_2008/implementation.pdf>)
> the decoding time should be reduced in 33%, but it was just 10% for pthread
> an 5% for openMP.
>
> I used a video with 1440x1080. The pthread implementation has 3 threads and
> the OpenMP was executed with the environment variable OMP_NUM_THREADS=3. The
> results are:
>
>                    Real(s)         User(s)
> System(s)             Speed up(%)
> OpenMP      25.2             29.2                  1.8
>   4
> PThread       23.8             28.3                  1.0
>     10
> Current        26.2             26.0
> 0.3                       0
>
> I used an Intel(R) Core(TM)2 Quad CPU with 2.4GHz and RAM of 4GB. The video
> has 85 seconds.
> These two implementations decode the Y, Cb and Cr planes in parallel, that
> is why I am using OMP_NUM_THREADS=3 and the upper bound gain is 33%, that
> is, let To be the time spent in decoding a video with the current
> implementation. Let T1 be a video decoded with the parallel implementation.
> T1 should be at most 0.66To.
>
> I will use the pthread implementation to try a pipelined version and see if
> we obtain more gains.
> These version will run the functions (c_dec_dc_unpredict_mcu_plane +
> oc_dec_frags_recon_mcu_plane) and
> (oc_state_loop_filter_frag_rows + oc_state_borders_fill_rows) in parallel.
> The upper bound for the gain is 60%, that is, let T2 be a video decoded with
> the pipelined implementation. T2 should be at most 0.4To.
>
> Here is the branch for the OpenMP implementation:
> http://svn.xiph.org/branches/theora_multithread_decode_omp/
> Here is the branch for the PThread implementation:
> http://svn.xiph.org/branches/theora_multithread_decode_pthread/
>
>
>
>
>
> Again, sorry about the long time without any feedback.
>
> --
> Leonardo de Paula Rosa Piga
> Undergraduate Computer Engineering Student
> LSC - IC - UNICAMP
> http://lampiao.lsc.ic.unicamp.br/~piga<http://lampiao.lsc.ic.unicamp.br/%7Epiga>

-- 
Leonardo de Paula Rosa Piga
Undergraduate Computer Engineering Student
LSC - IC - UNICAMP
http://lampiao.lsc.ic.unicamp.br/~piga
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.xiph.org/pipermail/theora-dev/attachments/20080706/68a6f990/attachment.htm