[theora-dev] Multi-Thread Theora Encoder

Wed Oct 10 08:54:51 PDT 2007

Another thing I forgot to note:

To validate the multithreaded version, I made a tool that compares the
original uncompressed YUV file and two theora files in respect of the
PSNR between each theora file decoded and the original uncompressed
YUV file.

So, my modified encoder version must achieve the same PSNR as the
unmodified encoder.

This way we can have two different theora streams, with the same
absolute quality.
Note that this is not the same as the perceived or subjective quality.

On 10/10/07, Felipe Portavales Goldstein <portavales at gmail.com> wrote:
> On 10/7/07, Maik Merten <maikmerten at gmx.net> wrote:
> > Nice work. Too bad I'm still on a single-core system (but now I have a
> > nice excuse to mothball this system and go ahead assembling a new one).
> >
> > Two things I noticed:
> >
> >  - output bitrate seems to vary slightly depending on how many threads
> > are used (no visible difference, though). If your goal for the
> > optimization is to have it produce exactly the same output and you're
> > thinking right now "wait, this shouldn't happen" then there may perhaps
> > be a problem in the new code.
>
> When you use only one thread, the output generated is exactly the same
> as the original (non-multi-thread) theora encoder.
>
> When you use 2 threads, for example, the Motion Vector search is
> executed in half of the screen independently (in parallel). One half
> for each thread.
>
> The first fragment of the second half of the screen could use the last
> motion vector from the previous fragment on the first half of the
> screen. This LAST_MV mode is used to save the bits of a new Motion
> Vector added to the stream.
>
> But, when using the threads, if we want to run the motion vector
> search in parallel, we must avoid data dependencies between threads
> and therefore we loose a little compression.
> Since the number of threads is small compared to the number of rows in
> the entire screen (height of the movie), this loss is quite small.
>
> Resuming, what I am trying to explain is this:
> For each thread we add the possibility of the need to introduce a new
> Motion Vector per frame on the stream compared to the non-threaded
> version.
>
> If we have 2 threads, we can have up to 1 MV per frame more than the
> non-threaded version.
> If we have 4 threads we can have up to 3 MV per frame more than the
> non-threaded version.
>
>
> >
> >  - the example encoder segfaults using an excessive number of threads
> > (for me 382 threads seems to be the maximum number of threads still
> > working on my CIF input file). For some reason my gdb doesn't work on
> > encoder_example ("not in executable format: File format not recognized"
> > - huh?) so I can't really say where things are going wrong (may e.g. be
> > a system lib doing strange stuff and not your code).
>
> Huh
> Yes,
> I forgot to test the limit of threads based on the number of
> Super-Block rows we have to encode.
> I am correcting this issue right now.
>
>
> Thanks Maik.
>
> Cheers,
> Felipe
>
> >
> >
> > Maik
> >
> >
> > Felipe Portavales Goldstein schrieb:
> > > I uploaded the changes to the branch:
> > > http://svn.xiph.org/branches/theora-multithread/
> > >
> > > This version is a branch from the current SVN trunk.
> > > I used the theora_control to set the number of threads as Giles sugested.
> > >
> > > to run the example you can specify the number of threads with the flag
> > > --number-of-threads
> > >
> > > For example:
> > >
> > > ./encoder_example --number-of-threads 1 ~/video-tests/pf01.yuv -o
> > > /tmp/thread.ogg
> > >
> > >
> > > Please,
> > > Test in different machines and with different videos.
> > >
> > >
> > > Cheers,
> > > Felipe
> > >
> > >
> > > On 10/3/07, Ralph Giles <giles at xiph.org> wrote:
> > >> On Wed, Oct 03, 2007 at 12:57:59AM -0300, Felipe Portavales Goldstein wrote:
> > >>
> > >>> As far as I tested the slowdown is less than 1 percent or irrelevant.
> > >> Good, we should be able to include this in the mainline then.
> > >>
> > >>> The only big difference in terms of performance is the extra-overhead
> > >>> in the PickModes function call and a extra final loop to re-order the
> > >>> Motion Vectors produced by each thread.
> > >> So this overhead can be skipped when there's only one thread.
> > >>
> > >>> The only thing that I should do before submit is a way to control the
> > >>> number of threads at run time (discover the number of CPUs at run
> > >>> time). By now I have to recompile the code to have a different number
> > >>> of threads running.
> > >> The best way to do this is through the theora_control() interface, since
> > >> applications will want control, and determining the number of available
> > >> CPUs is very platform dependent.
> > >>
> > >>  -r
> > >>
> > >
> > >
> >
> > _______________________________________________
> > theora-dev mailing list
> > theora-dev at xiph.org
> > http://lists.xiph.org/mailman/listinfo/theora-dev
> >
>
>
> --
> ________________________________________
> Felipe Portavales Goldstein <portavales at gmail>
> Undergraduate Student - IC-UNICAMP
> Computer Systems Laboratory
> http://www.students.ic.unicamp.br/~ra023772/
>

-- 
________________________________________
Felipe Portavales Goldstein <portavales at gmail>
Undergraduate Student - IC-UNICAMP
Computer Systems Laboratory
http://www.students.ic.unicamp.br/~ra023772/