[Speex-dev] TI 6xxx platform performance

Jerry Trantow jtrantow at ieee.org
Thu Jan 19 13:49:09 PST 2006

The majority of a Speex encoder app does fit in a 6713.  The 6713 has 8K of
L1 and another 256K of memory 64K of which can be configured as L2 cache.
(16,32,48, or 64K).  One level of TI's website seems to incorrectly indicate
only 64K of L2.

I turned off MANUAL_ALLOC and have it allocating internal memory using
calloc(). I did change the L2 cache to 2 way (32K) and adjusted the heap
size to 12K to get it to fit.  I put a wavefile and the .cinit, .const up in
the SDRAM.

                  name            origin    length      used    attr    fill
         ----------------------  --------  ---------  --------  ----
         IRAMB                   00000000   00000400  00000000  RWIX
         IRAMP                   00000400   00028c00  00028218  RWIX
         IRAM                    00029000   0000f000  00007fdc  RWIX
         CACHE_L2                00038000   00008000  00000000  RWIX
         SDRAM                   80000000   00800000  000181b7  RWIX

I'm currently using the simulator and the SDRAM doesn't seem to be a factor.
I put some test vectors up into SDRAM and when I call DSPF_sp_dotprod() I
get what I expect for cycles.  O(N/2)+25  The 32K L2 cache will help any
SDRAM access.

I initially had a compiler option wrong which was miniminizing size instead
of max speed, but I'm still at 44MIPS for a single channel.  

I saw the 10MFLOPS number in the documentation.  At first glance, a 300Mhz
67 looks under powered but a quick profile shows 50% of the cycles are
concentrated in just a few functions.  The 67xx is designed to execute these
functions and I figured I could get a factor of two out of these DSP
functions.  That would bring me under the 9.3MFLOPS requirement. 

The DSP functions are performing as I expect:
SP AutoCorrelation: (nx/2) * nr + (nr/2) * 5 + 10 - (nr * nr)/4 + nr
SP FIR Filter:	  4*floor((nh-1)/2)+14)*(ceil(nr/4)) + 8
SP Inner product:   nx/2 + 25

But unless the whole algorithm gets down near 10MIPS, I'm going to have to
go to the 64xx fixed point.

Jerry J. Trantow
Applied Signal Processing, Inc.
jtrantow at ieee.org

-----Original Message-----
From: Jim Crichton [mailto:jim.crichton at comcast.net] 
Sent: Thursday, January 19, 2006 10:33 AM
To: Jerry Trantow
Cc: speex-dev at xiph.org
Subject: Re: [Speex-dev] TI 6xxx platform performance


I think that just removing the FIXED_POINT define should be sufficient, 
though you mind want to turn off MANUAL_ALLOC, because I am not sure if the 
memory usage is identical for the fixed point build, and the constants in 
config.h are set for the fixed point build.  Are you testing on the 
simulator, or on an eval board?  It does not look like the 6713 has enough 
memory to hold Speex (64K vs. 1024K for the 6416), and your performance 
could suffer badly running from external memory.

I would be very surprised if you can get below 9.3MIPs/channel for floating 
point, since Jean-Marc has posted raw MIPs numbers somewhere around 10 MIPs 
for the algorithm itself.  And that is discounting the memory issues.  With 
the C6416 you can fit the code and data for 32 channels in internal memory.

If you want to post (or send me) your .pjt and .cmd files for the 6713 
build, I can take a look at it in the simulator (I am using the 6415 now, 
though I do not need as many channels).  But if you are experienced enough 
in this area to be doing 6416 optimizations, I am probably not telling you 
anything that you don't already know.  That is not a task for the faint of 

Jim Crichton

----- Original Message ----- 
From: "Jerry Trantow" <jtrantow at ieee.org>
To: "'Jean-Marc Valin'" <jean-marc.valin at usherbrooke.ca>
Cc: <speex-dev at xiph.org>
Sent: Thursday, January 19, 2006 10:40 AM
Subject: RE: [Speex-dev] TI 6xxx platform performance

I started my project using the CodeComposerStudio speex_C64_test.pjt in
speex  To build using floating point, I created a new project with
the same files and modified ti\config.h to #undef FIXED_POINT.  Is there a
better way to configure a floating point processor?

I have a few TI specific optimizations that could go into the next release.
What's the procedure for submitting code?

I've been working with this code for about a week now. I'm still trying to
understand it all, but I'm particularly impressed by the float vs fixed
flexibility of the code.

Jerry J. Trantow
Applied Signal Processing, Inc.
jtrantow at ieee.org

-----Original Message-----
From: Jean-Marc Valin [mailto:jean-marc.valin at usherbrooke.ca]
Sent: Thursday, January 19, 2006 1:00 AM
To: Jerry Trantow
Cc: speex-dev at xiph.org
Subject: Re: [Speex-dev] TI 6xxx platform performance

> To get a feel for the computational load, I am running 1 second (50
> of voice through the encoder.

You might want to use a bit more just so you don't see the
initialization complexity at all.

> My profile of the 6416 indicates I'm at 27.4M cycles/channel.  I need to
> below 720Mhz/32 channels = 22.5M cycles per channel.  I did a little work
> inner_prod() and normalize16() and I'm confident I can get 32 channels by
> optimizing 5 or 6 functions.  I expect these numbers to translate over the
> DM642.

have you tried defining PRECISION16? That should reduce the computation

> A lower cost option would be to use a floating point 6713.  I thought that
> 300Mhz floating point would come out even or ahead in an encoding
> comparison.  Instead of the 300M/32=9.3M cycles per channel that I need, I
> see 71.5M cycles per channel!!!

That's definitely strange. Normally, if your chip takes the same time to
do a float op than it takes to do an int op, then the float version
should be faster. That's because some of the float ops get replaced by
several int ops.

> Does this make sense?
> I'm generating floating point code, using the optimizer, etc...

Are you sure the compiler isn't using float emulation or something like

> Has anyone posted DM642, C64xx or C67xx benchmarks?

I'm not aware of any.


Speex-dev mailing list
Speex-dev at xiph.org

More information about the Speex-dev mailing list