[theora-dev] Multithread support

Tue Feb 3 20:17:42 PST 2015

M. Pabis wrote:
> If the multithreading encoding was dropped out, may I ask why?

IIRC, the commits from 2007 only threaded the motion search, and gave 
gains of only 10 to 20%. However, as part of the work to improve the 
Theora encoder quality for the HTML5 video tag, the way this search was 
done was completely rewritten, and we got significantly more than 10 to 
20% speed-ups just by having a smarter single-threaded algorithm.

> I think I could dedicate some of my free time to bring multithreading to
> the Theora encoder but I would like to ensure not to be redundant ;-)

I don't believe anyone has been working on this for some years. There 
are two basic approaches.

One is threading within a single frame, which does not require any API 
behavior changes. In theory you can scale to a fairly decent number of 
threads everywhere except the final conversion from tokens to VLC codes 
in oc_enc_frame_pack(). However, the units of work are sufficiently 
small and the task dependencies sufficiently involved that this needs 
some kind of lock-free work-stealing queues to have a hope of getting 
more benefit from the parallelism than you pay in synchronization 
overhead. I'd started designing one with the hope that all memory 
allocations could be done up-front at encoder initialization (to avoid 
locking contention there), but this turns out to be sufficiently 
different from how most lock-free data structures worked at the time 
that it was a fair amount of work. I've been meaning to look at what 
Mozilla's Servo project is doing for this these days (since they have 
similar challenges).

The other is traditional FFmpeg-style frame threading, which gives each 
thread a separate frame to encode, and merely waits for enough rows of 
the previous frame to be finished so that it can start its motion 
search. This is generally much more effective than threading within a 
frame, but a) requires additional delay (the API supports this in 
theory, but software using that API might not expect it, so it would 
have to be enabled manually through some sort of th_encode_ctl call) and 
b) requires changes to the rate control to deal with the fact that 
statistics from the previous frame are not immediately available. b) was 
the real blocker here.

In every encoder I know of (for any format), the second approach is much 
more effective than the first.