[theora-dev] Multi-Thread Theora Encoder

Felipe Portavales Goldstein portavales at gmail.com
Thu Oct 11 09:00:26 PDT 2007

On 10/11/07, Unga <unga888 at yahoo.com> wrote:
> --- Felipe Portavales Goldstein <portavales at gmail.com>
> wrote:
> > On 10/7/07, Maik Merten <maikmerten at gmx.net> wrote:
> > > Nice work. Too bad I'm still on a single-core
> > system (but now I have a
> > > nice excuse to mothball this system and go ahead
> > assembling a new one).
> > >
> > > Two things I noticed:
> > >
> > >  - output bitrate seems to vary slightly depending
> > on how many threads
> > > are used (no visible difference, though). If your
> > goal for the
> > > optimization is to have it produce exactly the
> > same output and you're
> > > thinking right now "wait, this shouldn't happen"
> > then there may perhaps
> > > be a problem in the new code.
> >
> > When you use only one thread, the output generated
> > is exactly the same
> > as the original (non-multi-thread) theora encoder.
> >
> > When you use 2 threads, for example, the Motion
> > Vector search is
> > executed in half of the screen independently (in
> > parallel). One half
> > for each thread.
> >
> > The first fragment of the second half of the screen
> > could use the last
> > motion vector from the previous fragment on the
> > first half of the
> > screen. This LAST_MV mode is used to save the bits
> > of a new Motion
> > Vector added to the stream.
> >
> > But, when using the threads, if we want to run the
> > motion vector
> > search in parallel, we must avoid data dependencies
> > between threads
> > and therefore we loose a little compression.
> > Since the number of threads is small compared to the
> > number of rows in
> > the entire screen (height of the movie), this loss
> > is quite small.
> >
> > Resuming, what I am trying to explain is this:
> > For each thread we add the possibility of the need
> > to introduce a new
> > Motion Vector per frame on the stream compared to
> > the non-threaded
> > version.
> >
> > If we have 2 threads, we can have up to 1 MV per
> > frame more than the
> > non-threaded version.
> > If we have 4 threads we can have up to 3 MV per
> > frame more than the
> > non-threaded version.
> >
> >
> "Since the number of threads is small compared to the
> number of rows in the entire screen (height of the
> movie), this loss is quite small."
> May not be the case next year :)
> Tilera (www.tilera.com) already ships a 64-core based
> processor and plan to release a 128-core processor
> soon. What if somebody used a quad 128-core Tilera
> machine, you get 512 cores. Is the loss in compression
> then still small?

I am working to parallelize the rest of the algorithm, so we will have
other threads running other parts of the algorithm, increasing the
speed without affecting the motion vector.

But anyway, we must consider the trade-off between speed of encoding
and quality of encoding.
And yet, with more processing power we can run better/slower motion
vector searches and maybe get more compression from better MVs.

> Just out of curiosity, is it possible to compress
> frames in parallel? That is, you allocate one thread
> per frame and process as many frames in parallel to
> the number of core you have. It may consume more RAM
> for sure, but RAM nowdays is not considered expensive.

It should be a good option in parallelizing, but it is not possible.
When we encode one frame, during the motion vector search we must use
the frame before current.

This frame before, must be already encoded and decoded, to use this
decoded (lossy) version instead of the original version. Using the
decoded version we avoid the errors to accumulate from one frame to
another. We use it to search for the motion vector.

So we have a data dependency between two frames, and so, there is no
way to process two entire frames in parallel.

But there is another way to do it.
We can process slices of frames in parallel, even slices of different
frames, with some constraints. We can think about a pipeline
architecture to encode different slices of a frame sequentially, but
achieving the parallelism from the pipeline.

> If this is really possible, we don't have to anymore
> restrict ourself to one machine, we can distribute
> frames to other machines on the network for
> processing.

we can do it if we send chunks of frames, run the encoders
independently and then merge all the parts, but even this way we will
loose Inter-frame compression from one chunk of frames to another. It
will be necessary to have an Intra frame on the beginning of each
chunk of frames.

> Regards
> Unga
> ____________________________________________________________________________________
> Boardwalk for $500? In 2007? Ha! Play Monopoly Here and Now (it's updated for today's economy) at Yahoo! Games.
> http://get.games.yahoo.com/proddesc?gamekey=monopolyherenow
> _______________________________________________
> theora-dev mailing list
> theora-dev at xiph.org
> http://lists.xiph.org/mailman/listinfo/theora-dev

Felipe Portavales Goldstein <portavales at gmail>
Undergraduate Student - IC-UNICAMP
Computer Systems Laboratory

More information about the theora-dev mailing list