[theora-dev] Parallel processing for Theora?
spam_receptacle_ at hotmail.com
Sun Mar 21 12:53:07 PDT 2010
> Date: Sun, 21 Mar 2010 10:04:08 -0400
> From: tterribe at email.unc.edu
> >> There is some scope for multithreading the per-frame pipeline, but
> >> that only scales to three or four threads.
> The work I had in the pipeline was for parallel _decoding_ first,
> because this is considerably easier. If it works out, parallel encoding
> can be done within the same framework. The design work for this is
> already done, and I had started on the implementation, but it got put
> down as priorities changed and will probably not get picked up again
> before the 1.2 release.
The problem with starting with a parallel decoder is that decoding is much easier. Correct me if I'm wrong but decoding requires much less computation than encoding. As a result, thread overhead will be exaggerated for decoding and won't reflect real-world use since parallel processing will mostly be used for encoding...
> It's unclear how well it will scale. Years ago we did a (very simple)
> parallel decoder that only partitioned things by color plane, and the
> speed-up was disappointing. There was also a GSoC project that attempted
> to improve on this, but it was not successfully completed. In theory you
> could have a separate thread for every MCU (64 pixels of height for
> 4:2:0, 32 pixels for 4:2:2 and 4:4:4). However, within-frame parallelism
> is fairly fine-grained, and the overhead of a standard mutex-based
> library like pthreads is pretty enormous for this kind of thing. People
> have reported getting speed-ups on FFT workloads as small as 10,000
> cycles with lock-free algorithms (by comparison, a single pthread mutex
> acquisition could take thousands of cycles by itself), but they made
> very specific assumptions about architecture, cache line size, etc.
> In short, this requires some non-trivial engineering to get it to work
> well, and for the moment my priorities are still on improving encoder
> quality before worrying about encoder speed. If there are qualified
> people out there willing to work on this, I'd be happy to explain the
> details of what needs to be done.
It sounds like you're considering building threads into libtheora itself. Since it targets an arguably small set of platforms, this is not an outrageous idea but I think it will be better to leave threading to the client and support parallelism through the API much like Vorbis. This way, people can choose their threading implementation or port libtheora to future platforms... Thank you...
Live connected with Messenger on your phone
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the theora-dev