[tremor] Tremor on strongarm CPU
Nicolas Pitre
nico at cam.org
Mon Sep 9 09:25:30 PDT 2002
On Sun, 8 Sep 2002, dilb wrote:
> Hi,
> as Nicholas Pitre did, I did spend some time on a fixed point version of the
> ogg vorbis decoder. Now, as Tremor is free, I will help merge the things I
> did optimize for my version of the decoder. But first, I have some questions
> regarding arm optimisation made in the mdct_backward function.
>
> So, as far as I have seen, in this function are used MULT31 and MULT30,
> defined this way:
> static inline ogg_int32_t MULT32(ogg_int32_t x, ogg_int32_t y) {
> int lo,hi;
> asm volatile("smull %0,%1,%2,%3;\n"
> : "=&r"(lo),"=&r"(hi)
> : "%r"(x),"r"(y));
> return(hi);
> }
>
> static inline ogg_int32_t MULT31(ogg_int32_t x, ogg_int32_t y) {
> return MULT32(x,y)<<1;
> }
>
> static inline ogg_int32_t MULT30(ogg_int32_t x, ogg_int32_t y) {
> return MULT32(x,y)<<2;
> }
> so, obviously in MULT31 (resp. MULT30), 1 bit (resp. 2 bits) are truncated.
> My question is: according to the specification, is this truncation noise
> amplified by the following stages of the decoder, i.e. can a human being
> perceive it ? (Sorry, I didn't have enough time to read vorbis specifications
> :))
Depends how many bits you're using in the final output samples. This is in
fact a clever way to eliminate a second shift instruction when scaling back
the multiplication result. You're losing one (or two) bit of accuracy each
time a particular value is multiplied, but if your final samples are only 16
bits wide you then have some room out of the 16 remaining bits for accuracy
loss.
> Now, what I can propose, is an asm version of mdct_backward function, from my
> fixed point project, I did already merge it with tremor with a quick
> adaptation hack, and I get a good 5% improvement when decoding. And
> currently, it's using 64bits operation/accumulation (SMULL/SMLAL
> instructions), so it can be a little bit more improved, depending on the
> answer to my question.
If applying the same set of compiler flags to Tremor that I used in my own
version, I also get that 5% or more improvement as well. Would you mind
posting your assembly version (or making it available somewhere if too big)
and I could compare the performance gain as well?
> I had another function written in pure asm, but it seems, I will need more
> time to merge it with tremor, and it will probably not be worth.
So far, inspecting the generated assembly from a recent gcc (currently
testing with gcc 3.2), the compiler is increasingly doing a pretty good job
out of plain C code. It just looks like writing assembly code is less worth
it than it was before.
> Fred <Dilb> Boulay.
>
> P.S.: For people interested in my hack of the ogg vorbis decoder:
> http://dilb.blablanux.org/ipaq/fr/software/oggvorbis_SA1110.html , click on
> [miroir] (the page is currently in french only, but the changelog and the
> .tgz are commented in english).
Oh parfait, pas de problème alors.
<p>Nicolas
--- >8 ----
List archives: http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'tremor-request at xiph.org'
containing only the word 'unsubscribe' in the body. No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.
More information about the Tremor
mailing list