[Theora-dev] FPGA implementation/ players speed?

Timothy B. Terriberry tterribe at vt.edu
Wed Mar 9 12:07:49 PST 2005

Some more data points, on a 3.06 GHz P4.

Pure C experimental decoder

real    0m5.513s
user    0m5.480s
sys     0m0.040s
23.2 fps

With Rudolf's recent MMX patches (http://ssh.cz/~ruik/patch_theora):

real    0m4.918s
user    0m4.860s
sys     0m0.070s
26 fps

That's commeasurate with the 11% speedup numbers he reported, and is
getting pretty close to real-time. Unlike the VP3HoSwiYo patches, it
does not include an MMX iDCT, so there is room yet for improvement. I'm
reasonably confident that we should be able to get to real-time decoding
at that resolution on this hardware.

The mainline decoder on the same machine:

real    0m6.027s
user    0m5.960s
sys     0m0.090s

So pure C optimizations give you about 9%, and the current MMX
optimizations another 11%. Note that all tests were with

CFLAGS=-O2 -fforce-addr -fomit-frame-pointer -finline-functions

With default CFLAGS (just -O2), the mainline takes 11.447 seconds for
just decode. So the compiler alone can make almost a 50% difference.

More information about the Theora-dev mailing list