[theora-dev] Multithread support
Timothy B. Terriberry
tterribe at vt.edu
Tue Feb 3 20:17:42 PST 2015
M. Pabis wrote:
> If the multithreading encoding was dropped out, may I ask why?
IIRC, the commits from 2007 only threaded the motion search, and gave
gains of only 10 to 20%. However, as part of the work to improve the
Theora encoder quality for the HTML5 video tag, the way this search was
done was completely rewritten, and we got significantly more than 10 to
20% speed-ups just by having a smarter single-threaded algorithm.
> I think I could dedicate some of my free time to bring multithreading to
> the Theora encoder but I would like to ensure not to be redundant ;-)
I don't believe anyone has been working on this for some years. There
are two basic approaches.
One is threading within a single frame, which does not require any API
behavior changes. In theory you can scale to a fairly decent number of
threads everywhere except the final conversion from tokens to VLC codes
in oc_enc_frame_pack(). However, the units of work are sufficiently
small and the task dependencies sufficiently involved that this needs
some kind of lock-free work-stealing queues to have a hope of getting
more benefit from the parallelism than you pay in synchronization
overhead. I'd started designing one with the hope that all memory
allocations could be done up-front at encoder initialization (to avoid
locking contention there), but this turns out to be sufficiently
different from how most lock-free data structures worked at the time
that it was a fair amount of work. I've been meaning to look at what
Mozilla's Servo project is doing for this these days (since they have
similar challenges).
The other is traditional FFmpeg-style frame threading, which gives each
thread a separate frame to encode, and merely waits for enough rows of
the previous frame to be finished so that it can start its motion
search. This is generally much more effective than threading within a
frame, but a) requires additional delay (the API supports this in
theory, but software using that API might not expect it, so it would
have to be enabled manually through some sort of th_encode_ctl call) and
b) requires changes to the rate control to deal with the fact that
statistics from the previous frame are not immediately available. b) was
the real blocker here.
In every encoder I know of (for any format), the second approach is much
more effective than the first.
More information about the theora-dev
mailing list