[Theora-dev] FPGA implementation/ players speed?
Timothy B. Terriberry
tterribe at vt.edu
Wed Mar 9 12:07:49 PST 2005
Some more data points, on a 3.06 GHz P4.
Pure C experimental decoder
(http://svn.xiph.org/experimental/derf/theora-exp/):
real 0m5.513s
user 0m5.480s
sys 0m0.040s
23.2 fps
With Rudolf's recent MMX patches (http://ssh.cz/~ruik/patch_theora):
real 0m4.918s
user 0m4.860s
sys 0m0.070s
26 fps
That's commeasurate with the 11% speedup numbers he reported, and is
getting pretty close to real-time. Unlike the VP3HoSwiYo patches, it
does not include an MMX iDCT, so there is room yet for improvement. I'm
reasonably confident that we should be able to get to real-time decoding
at that resolution on this hardware.
The mainline decoder on the same machine:
real 0m6.027s
user 0m5.960s
sys 0m0.090s
So pure C optimizations give you about 9%, and the current MMX
optimizations another 11%. Note that all tests were with
CFLAGS=-O2 -fforce-addr -fomit-frame-pointer -finline-functions
-funroll-loops
With default CFLAGS (just -O2), the mainline takes 11.447 seconds for
just decode. So the compiler alone can make almost a 50% difference.
More information about the Theora-dev
mailing list