[Speex-dev] TI 6xxx platform performance
Jerry Trantow
jtrantow at ieee.org
Thu Jan 19 13:49:09 PST 2006
The majority of a Speex encoder app does fit in a 6713. The 6713 has 8K of
L1 and another 256K of memory 64K of which can be configured as L2 cache.
(16,32,48, or 64K). One level of TI's website seems to incorrectly indicate
only 64K of L2.
I turned off MANUAL_ALLOC and have it allocating internal memory using
calloc(). I did change the L2 cache to 2 way (32K) and adjusted the heap
size to 12K to get it to fit. I put a wavefile and the .cinit, .const up in
the SDRAM.
name origin length used attr fill
---------------------- -------- --------- -------- ----
--------
IRAMB 00000000 00000400 00000000 RWIX
IRAMP 00000400 00028c00 00028218 RWIX
IRAM 00029000 0000f000 00007fdc RWIX
CACHE_L2 00038000 00008000 00000000 RWIX
SDRAM 80000000 00800000 000181b7 RWIX
I'm currently using the simulator and the SDRAM doesn't seem to be a factor.
I put some test vectors up into SDRAM and when I call DSPF_sp_dotprod() I
get what I expect for cycles. O(N/2)+25 The 32K L2 cache will help any
SDRAM access.
I initially had a compiler option wrong which was miniminizing size instead
of max speed, but I'm still at 44MIPS for a single channel.
I saw the 10MFLOPS number in the documentation. At first glance, a 300Mhz
67 looks under powered but a quick profile shows 50% of the cycles are
concentrated in just a few functions. The 67xx is designed to execute these
functions and I figured I could get a factor of two out of these DSP
functions. That would bring me under the 9.3MFLOPS requirement.
The DSP functions are performing as I expect:
SP AutoCorrelation: (nx/2) * nr + (nr/2) * 5 + 10 - (nr * nr)/4 + nr
SP FIR Filter: 4*floor((nh-1)/2)+14)*(ceil(nr/4)) + 8
SP Inner product: nx/2 + 25
But unless the whole algorithm gets down near 10MIPS, I'm going to have to
go to the 64xx fixed point.
Jerry J. Trantow
Applied Signal Processing, Inc.
jtrantow at ieee.org
-----Original Message-----
From: Jim Crichton [mailto:jim.crichton at comcast.net]
Sent: Thursday, January 19, 2006 10:33 AM
To: Jerry Trantow
Cc: speex-dev at xiph.org
Subject: Re: [Speex-dev] TI 6xxx platform performance
Jerry,
I think that just removing the FIXED_POINT define should be sufficient,
though you mind want to turn off MANUAL_ALLOC, because I am not sure if the
memory usage is identical for the fixed point build, and the constants in
config.h are set for the fixed point build. Are you testing on the
simulator, or on an eval board? It does not look like the 6713 has enough
memory to hold Speex (64K vs. 1024K for the 6416), and your performance
could suffer badly running from external memory.
I would be very surprised if you can get below 9.3MIPs/channel for floating
point, since Jean-Marc has posted raw MIPs numbers somewhere around 10 MIPs
for the algorithm itself. And that is discounting the memory issues. With
the C6416 you can fit the code and data for 32 channels in internal memory.
If you want to post (or send me) your .pjt and .cmd files for the 6713
build, I can take a look at it in the simulator (I am using the 6415 now,
though I do not need as many channels). But if you are experienced enough
in this area to be doing 6416 optimizations, I am probably not telling you
anything that you don't already know. That is not a task for the faint of
heart.
Jim Crichton
----- Original Message -----
From: "Jerry Trantow" <jtrantow at ieee.org>
To: "'Jean-Marc Valin'" <jean-marc.valin at usherbrooke.ca>
Cc: <speex-dev at xiph.org>
Sent: Thursday, January 19, 2006 10:40 AM
Subject: RE: [Speex-dev] TI 6xxx platform performance
I started my project using the CodeComposerStudio speex_C64_test.pjt in
speex 1.1.11.1. To build using floating point, I created a new project with
the same files and modified ti\config.h to #undef FIXED_POINT. Is there a
better way to configure a floating point processor?
I have a few TI specific optimizations that could go into the next release.
What's the procedure for submitting code?
I've been working with this code for about a week now. I'm still trying to
understand it all, but I'm particularly impressed by the float vs fixed
flexibility of the code.
Jerry J. Trantow
Applied Signal Processing, Inc.
jtrantow at ieee.org
-----Original Message-----
From: Jean-Marc Valin [mailto:jean-marc.valin at usherbrooke.ca]
Sent: Thursday, January 19, 2006 1:00 AM
To: Jerry Trantow
Cc: speex-dev at xiph.org
Subject: Re: [Speex-dev] TI 6xxx platform performance
> To get a feel for the computational load, I am running 1 second (50
frames)
> of voice through the encoder.
You might want to use a bit more just so you don't see the
initialization complexity at all.
> My profile of the 6416 indicates I'm at 27.4M cycles/channel. I need to
get
> below 720Mhz/32 channels = 22.5M cycles per channel. I did a little work
on
> inner_prod() and normalize16() and I'm confident I can get 32 channels by
> optimizing 5 or 6 functions. I expect these numbers to translate over the
> DM642.
have you tried defining PRECISION16? That should reduce the computation
cost.
> A lower cost option would be to use a floating point 6713. I thought that
a
> 300Mhz floating point would come out even or ahead in an encoding
> comparison. Instead of the 300M/32=9.3M cycles per channel that I need, I
> see 71.5M cycles per channel!!!
That's definitely strange. Normally, if your chip takes the same time to
do a float op than it takes to do an int op, then the float version
should be faster. That's because some of the float ops get replaced by
several int ops.
> Does this make sense?
> I'm generating floating point code, using the optimizer, etc...
Are you sure the compiler isn't using float emulation or something like
that?
> Has anyone posted DM642, C64xx or C67xx benchmarks?
I'm not aware of any.
Jean-Marc
_______________________________________________
Speex-dev mailing list
Speex-dev at xiph.org
http://lists.xiph.org/mailman/listinfo/speex-dev
More information about the Speex-dev
mailing list