[tremor] [PATCH] 12% global performance gain on a StrongARM

Chris Gilbert chris at dokein.co.uk
Thu Sep 19 10:15:09 PDT 2002



Nicolas Pitre said:
> On Thu, 19 Sep 2002, Chris Gilbert wrote:
>
>> One thing I did notice, and it's not your bug, is that the CLIP_TO_15
>> is missing the fact that x is also an input arg, which means the
>> compiler has no reason to assign x into the correct register, the fact
>> that it works is probably luck (note I've not checked that it does
>> work as is).
>
> It's not buggy at all.  From GCC's manual:
>
> * C Extensions::    GNU extensions to the C language family.
> --> * Extended Asm::        Assembler instructions with C expressions as
> operands.
>     --> *Note Modifiers::.
>
> [...]
> `='
>      Means that this operand is write-only for this instruction: the
> previous value is discarded and replaced by output data.
>
> `+'
>      Means that this operand is both read and written by the
>      instruction.
>
>      When the compiler fixes up the operands to satisfy the constraints,
> it needs to know which operands are inputs to the instruction and
> which are outputs from it.  `=' identifies an output; `+'
>      identifies an operand that is both input and output; all other
> operands are assumed to be input only.

gah, all these options, I knew sending email pre-coffee would be bad 8)
perhaps it would be better to let the compiler pick what it does with the
value of x, IE which register is uses for the input and output where
possible.  (not that I imagine on arm it wouldn't pick the same register)

>> One thing that did puzzle me is why the memory barrier is needed in
>> XPROD32, XPROD31 and XNPROD31, I guess I need to look at the asm, but
>> doing *x = xl; the compiler should know that the memory has been
>> updated.  AFAIK the memory barrier means that memory has been updated
>> in some way that the compiler can't see, the need for the your change
>> hints that the compiler is doing something wrong.
>
> In some cases, the compiler would have inverted the two assignments.
> However *x really needs to be written to memory first since the value of
> y1  is still being processed in the pipeline by the smlal instruction.
> The  memory barrier ensure that ordering.

Wouldn't it be better to push the result storing into the asm, to be
certain that the ordering is always enforced correctly, and the correct
number of cycles is waited?  Although I'm surprised that the processor
doesn't actually do a stall to wait for the result.

Cheers,
Chris

<p>--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'tremor-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.



More information about the Tremor mailing list