[Speex-dev] Speex inner_prod(), normalize, C64 MIPS

Jerry Trantow jtrantow at ieee.org
Sat Feb 4 09:38:06 PST 2006

Ok, I hadn't verified inner product was called with values scaled to <=
+-16384.  That would make it safe to do a 32 bit add of the intermediate
terms. I have implemented the 40-bit accumulator.

> by the shift.  I also see a FIXED_POINT danger with the summation of four
> mults overflowing the 32 bit before the shift.  
> I can fix this by accumulating each term into a long, but if the code
> the x[],y[] vectors to avoid this problem I could use parallel 16x16
> multiply/adds.  

What do you mean here?

The C64x has a _dotp2() instruction that does two 16x16 multiplies and adds
the products together.  Since the values are scaled to 16384, I can add the
results of the two _dotp2()s together before the long add without worrying
about overflow.  I didn't understand that inner_prod() was always passed
scaled vectors.  That's the danger of optimizing routines without knowing
how they are called.

I split a norm_shift() out of your normalize16().  This function can also be
used twice in pitch_gain_search_3tap().  Are there any other places that
would benefit from this optimized routine?

	Returns number of shifts to normalize a 32 bit vector to 
static inline int norm_shift(const spx_sig_t *x, spx_sig_t max_scale, int
    int sig_shift_ti;
	int i;

	#warn Using the optimized normalize16() function.
        Directly find the min(_norm(x[i]) rather than searching for
max(abs(x[i])) and taking _norm.
    #pragma MUST_ITERATE(24,184,4)
    for (i=0;i<len;i++)
        Return the shift value.
}	//	norm_shift().	

PS.  Here are the C64x MIPS vs Complexity results for the original code.  I
have been able to reduce the complexity 1 encoder to 15.7 MIPS.

Complexity	Original 32	Original 16
1	31.2	29.6
2	41.7	39.8
3	51.4	49.0
4	61.6	
7		93.1
9		120.8

Jerry J. Trantow
Applied Signal Processing, Inc.
jtrantow at ieee.org

More information about the Speex-dev mailing list