[tremor] [PATCH] 12% global performance gain on a StrongARM
marc dukette
dukette at adelphia.net
Sun Sep 22 09:37:53 PDT 2002
I know most of you are mostly concerned with the GCC compiler, however I
just pulled the lastest source from CVS and compiled with Microsoft's
Embedded Visual C and the latest is noticeably slower than the original
release of Tremor. I'm guessing this is something in the eVC compiler and
some of the new optimizations. I can't give exact numbers since all my
tests are with regard to audio embedded in a video stream and the number of
frame drops incurred, however innacurate this measurement is, for a given
stream it is very consistent and with the latest set of patches the number
of frame drops has more than doubled from the original Tremor release .
This is with a 128kbps vorbis audio. I've run this on both an SA1100 and a
PXA250 with similar results.
If I get a chance I will try to narrow down where the performance hit is
coming from.
----- Original Message -----
From: "Nicolas Pitre" <nico at cam.org>
To: <tremor at xiph.org>
Sent: Thursday, September 19, 2002 12:52 AM
Subject: [tremor] [PATCH] 12% global performance gain on a StrongARM
<p>>
> The attached patch provides a 12% performance gain on a StrongARM SA1110
> over current code in the CVS. This is mostly C code shuffling so to help
> GCC produce nearly perfect assembly on ARM. Probably a hand optimized
> assembly version of mdct.c could do even better, but I'll leave this task
to
> others (Dilb?). At least this will produce the best compiler generated
> reference to start with as well as improving performance for all
> architectures in general.
>
> So for the details, this patch does:
>
> - Includes my previous patch with interpolation code for correct accuracy
> with all block sizes.
> - Interlaces sin and cos values in the lookup table to reduce register
> pressure since only one pointer is required to walk the table instead
of
> two. This also accounts for better cache locality.
> - Split the lookup table into two tables since half of it (one value
every
> two) is only used in separate section of the code and only with large
> block sizes. Therefore the table size used for the common case is
reduced
> by 2 accounting for yet better cache usage.
> - Abstracted all cross products throughout the code so they can be easily
> optimized. First this prevents redundant register reloads on ARM due
to
> the implicit memory access ordering, next this allowed for the
> opportunity to hook some inline assembly to perform the actual
operation.
> - Fix layout of current assembly in asm_arm.h to match GCC's output (more
> enjoyable to read when inspecting the final assembly) plus some
> constraint correctness issues.
> - Added a memory barrier macro to force the compiler not to cache values
> into registers or on the stack in some cases.
> - Reordered some code for better ARM assembly generation by
> the compiler.
>
> Enjoy!
>
>
> Nicolas
>
--- >8 ----
List archives: http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'tremor-request at xiph.org'
containing only the word 'unsubscribe' in the body. No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.
More information about the Tremor
mailing list