[theora-dev] Multi-Thread Theora Encoder

Thu Oct 11 14:21:15 PDT 2007

On 2007-10-11, Felipe Portavales Goldstein wrote:
> On 2007-10-11, Unga wrote:
>>
>> Just out of curiosity, is it possible to compress frames in parallel?
>> That is, you allocate one thread per frame and process as many frames
>> in parallel to the number of core you have. It may consume more RAM
>> for sure, but RAM nowdays is not considered expensive.
>
> It should be a good option in parallelizing, but it is not possible.
> When we encode one frame, during the motion vector search we must use
> the frame before current.
>
> This frame before, must be already encoded and decoded, to use this
> decoded (lossy) version instead of the original version. Using the
> decoded version we avoid the errors to accumulate from one frame to
> another. We use it to search for the motion vector.
>
> So we have a data dependency between two frames, and so, there is no
> way to process two entire frames in parallel.
>
> But there is another way to do it.  We can process slices of frames in
> parallel, even slices of different frames, with some constraints. We
> can think about a pipeline architecture to encode different slices of
> a frame sequentially, but achieving the parallelism from the pipeline.

You're on the right track with the slices. They don't need to be
independent, just synchronized. When encoding a given block, you don't
need to have encoded the whole reference frame, just the part of the
reference frame that the current block will be predicted from.
Two ways to deal with this dependency:
a) Restrict the motion search range so that it doesn't exceed the area
of the reference frame that has been encoded.
b) Run the initial motion estimation on the input frames instead of the
reconstructed frames. Then you know which block of the reference frame
is needed, and wait for it if it's not already encoded. Then use the
reconstructed reference block to actually encode the current block.

See x264 for an implementation of (a).

>> If this is really possible, we don't have to anymore restrict ourself
>> to one machine, we can distribute frames to other machines on the
>> network for processing.
>
> we can do it if we send chunks of frames, run the encoders
> independently and then merge all the parts, but even this way we will
> loose Inter-frame compression from one chunk of frames to another. It
> will be necessary to have an Intra frame on the beginning of each
> chunk of frames.

So decide frame types first, then split the chunks at frames that were
intra anyway.
Or split the video into one large chunk per thread (e.g. with 4 cpus,
the first one gets the first 1/4 of the movie). Overlap the chunks
until the overlap regions contain an intra frame. Discard the few
redundantly encoded frames and merge the chunks at the intra frames.

--Loren Merritt