[tremor] [PATCH] significantly reduce Tremor ROM size requirement
Nicolas Pitre
nico at cam.org
Thu Sep 5 12:17:45 PDT 2002
On Thu, 5 Sep 2002, Monty wrote:
> If static data is really that tight, the largest tables can simply be
> left out, rendering Vorbis files that use those tables unable to
> decode. Is it best to know that ahead of time, or run into
> unexpectedly? Generally it's good to know ahead of time what you can,
> but be able to handle the unexpected event (while minimizing
> unexpected events).
Fair enough.
> > But at least for now you could consider this patchlet which shouldn't be
> > controversial:
> >
> > diff -urN orig/Tremor/floor0.c Tremor/floor0.c
> > --- orig/Tremor/floor0.c Mon Sep 2 23:15:19 2002
> > +++ Tremor/floor0.c Wed Sep 4 16:32:07 2002
> > @@ -117,21 +118,21 @@
> > }
> > }
> >
> > -static int MLOOP_1[64]={
> > +static unsigned char MLOOP_1[64]={
> > 0,10,11,11, 12,12,12,12, 13,13,13,13, 13,13,13,13,
> > 14,14,14,14, 14,14,14,14, 14,14,14,14, 14,14,14,14,
> > 15,15,15,15, 15,15,15,15, 15,15,15,15, 15,15,15,15,
> > 15,15,15,15, 15,15,15,15, 15,15,15,15, 15,15,15,15,
> > };
> >
> > -static int MLOOP_2[64]={
> > +static unsigned char MLOOP_2[64]={
> > 0,4,5,5, 6,6,6,6, 7,7,7,7, 7,7,7,7,
> > 8,8,8,8, 8,8,8,8, 8,8,8,8, 8,8,8,8,
> > 9,9,9,9, 9,9,9,9, 9,9,9,9, 9,9,9,9,
> > 9,9,9,9, 9,9,9,9, 9,9,9,9, 9,9,9,9,
> > };
> >
> > -static int MLOOP_3[8]={0,1,2,2,3,3,3,3};
> > +static unsigned char MLOOP_3[8]={0,1,2,2,3,3,3,3};
>
> This is static data in the heaviest-weight tight loop in all of
> Vorbis; the loop accounts for 50% of CPU usage for beta-1 and beta-2
> files. Going int->char affects GCC-ARM's memory addressing strategy
> dramatically; in gcc < 3.0, it generally affects it negatively. Is
> there a reson to do this aside from saving 100 bytes? Do you have
> performance figures from a few processors/compilers to justify it?
Here's a snapshot of the generated assembly difference on gcc-2.95.3:
(-) lines with int
(+) lines with unsigned char
.L208:
orr r3, r4, lr
ldr r0, .L239+16
- mov r2, r3, lsr #25
- ldr ip, [r0, r2, asl #2]
+ ldrb ip, [r0, r3, lsr #25] @ zero_extendqisi2
cmp ip, #0
bne .L214
ldr r1, .L239+20
- mov r2, r3, lsr #19
- ldr ip, [r1, r2, asl #2]
+ ldrb ip, [r1, r3, lsr #19] @ zero_extendqisi2
cmp ip, #0
- bne .L214
- mov r2, r3, lsr #16
- ldr r3, .L239+20
- ldr ip, [r3, r2, asl #2]
+ ldreq r2, .L239+12
+ ldreqb ip, [r2, r3, lsr #16] @ zero_extendqisi2
.L214:
ldr r3, [fp, #-76]
cmp r3, #0
beq .L216
[...]
Of course GCC-ARM's memory addressing strategy is affected but rather
positively in my opinion. Not only it emit 4 fewer instructions in this
particular case, but you'll also get much better cache usage for the byte
array. And if I remember correctly, a ldrb has the same cycle count as a
ldr. Just checked with gcc-3.2 and the same pattern exists there. It's
mostly always easier to index a byte array than any other larger element.
In what case did you observe a negative impact?
<p>Nicolas
--- >8 ----
List archives: http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'tremor-request at xiph.org'
containing only the word 'unsubscribe' in the body. No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.
More information about the Tremor
mailing list