[vorbis-dev] vorbis on playstation 2

Thu Feb 6 18:56:14 PST 2003

On Thu, 6 Feb 2003, David Etherton wrote:

> How about this optimiziation I tried for MULT31:
> 
> ogg_int32_t MULT31(ogg_int32_t x, ogg_int32_t y) {
>   return ((ogg_int32_t) (((ogg_int64_t)x * y) >> 31));
> }
> 
> ogg_int32_t MULT31x(ogg_int32_t x, ogg_int32_t y) {
>   return ((ogg_int32_t) (((ogg_int64_t)x * y) >> 32) << 1);
> }
> 
> doing a shift right by 32 and a shift left by one allows the compiler to
> avoid double-wide shift insns:
> 
> 0000000000000020 <MULT31>:
>   20:   00850018        mult    $a0,$a1
>   24:   00001812        mflo    $v1
>   28:   00001010        mfhi    $v0
>   2c:   0003183c        dsll32  $v1,$v1,0x0
>   30:   0002103c        dsll32  $v0,$v0,0x0
>   34:   0003183e        dsrl32  $v1,$v1,0x0
>   38:   00431025        or      $v0,$v0,$v1
>   3c:   00021078        dsll    $v0,$v0,0x1
>   40:   0002103f        dsra32  $v0,$v0,0x0
>   44:   03e00008        jr      $ra
>   48:   00000000        nop
> 
> 0000000000000010 <MULT31x>:
>   10:   00850018        mult    $a0,$a1
>   14:   00001010        mfhi    $v0
>   18:   03e00008        jr      $ra
>   1c:   00021040        sll     $v0,$v0,0x1

Hmmm... I know nothing about MIPS assembly, but I think you should be able 
to do better than that.

What the standard code does is:

tatic inline ogg_int32_t MULT32(ogg_int32_t x, ogg_int32_t y) {
  union magic magic;
  magic.whole = (ogg_int64_t)x * y;
  return magic.halves.hi;
}

tatic inline ogg_int32_t MULT31(ogg_int32_t x, ogg_int32_t y) {
  return MULT32(x,y)<<1;
}

What happens in the MULT32 case is 32x32->64 and then you go away with the 
high 32 bits completely discarding the low 32 bits therefore avoiding shifts 
altogether.  MULT31 is mostly the same but the high 32 bits (which are the 
only bits you care about) are shifted left by 1.  No need to shift the whole 
64 bits.  The assembly of MULT32 above seems way too much.

> Would the single bit make a big difference?

No, it's lost in the integer truncation noise which doesn't show up with 16 
bit audio samples.

If your CPU has multiply-and-accumulate insns, you might consider optimizing
the XPROD31 and XPROD32 functions.  See the ARM assembly version in
asm_arm.h for example.

Oh and make sure all those functions are actually inlined by the compiler 
otherwise you'll waste a tremendous amount of cycles in function call 
overhead.

<p>Nicolas

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.