[tremor] [PATCH] significantly reduce Tremor ROM size requirement

Thu Sep 5 12:44:09 PDT 2002

> > Vorbis; the loop accounts for 50% of CPU usage for beta-1 and beta-2
> > files.  Going int->char affects GCC-ARM's memory addressing strategy
> > dramatically; in gcc < 3.0, it generally affects it negatively.  Is
> > there a reson to do this aside from saving 100 bytes?  Do you have
> > performance figures from a few processors/compilers to justify it?
> 
> Here's a snapshot of the generated assembly difference on gcc-2.95.3:
> (-) lines with int
> (+) lines with unsigned char
> 
>  .L208:
>         orr     r3, r4, lr
>         ldr     r0, .L239+16
> -       mov     r2, r3, lsr #25
> -       ldr     ip, [r0, r2, asl #2]
> +       ldrb    ip, [r0, r3, lsr #25]   @ zero_extendqisi2
>         cmp     ip, #0
>         bne     .L214
>         ldr     r1, .L239+20
> -       mov     r2, r3, lsr #19
> -       ldr     ip, [r1, r2, asl #2]
> +       ldrb    ip, [r1, r3, lsr #19]   @ zero_extendqisi2
>         cmp     ip, #0
> -       bne     .L214
> -       mov     r2, r3, lsr #16
> -       ldr     r3, .L239+20
> -       ldr     ip, [r3, r2, asl #2]
> +       ldreq   r2, .L239+12
> +       ldreqb  ip, [r2, r3, lsr #16]   @ zero_extendqisi2
>  .L214:
>         ldr     r3, [fp, #-76]
>         cmp     r3, #0
>         beq     .L216
>         [...]
> 
> Of course GCC-ARM's memory addressing strategy is affected but rather
> positively in my opinion.

Given the above snippet, I agree.

>  Not only it emit 4 fewer instructions in this
> particular case, but you'll also get much better cache usage for the byte
> array.  And if I remember correctly, a ldrb has the same cycle count as a
> ldr.  Just checked with gcc-3.2 and the same pattern exists there.  It's
> mostly always easier to index a byte array than any other larger element.  
> In what case did you observe a negative impact?

During original work with arm-elf-gcc when working on code prior to
Tremor.  Going to byte arrays there was a significant performance on
the Netwinder.  However, it was not my compiler install and I did not
inspect the asm; I merely noted the hit and moved on.

Regardless, the above is indeed evidence that the char arrays are more
efficient, at very least for ARM (which is currently the platform we
care about most).  I'll apply that patch, and make a note to check
performance on word-aligned-access platforms.

Monty
--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'tremor-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.