[Speex-dev] TI 6xxx platform performance

Thu Jan 19 15:03:51 PST 2006

Oh, and have you played with SPEEX_SET_COMPLEXITY. The default is 2, but
setting it to one will reduce the complexity at a small cost in quality.

	Jean-Marc

Le jeudi 19 janvier 2006 à 15:49 -0600, Jerry Trantow a écrit :
> The majority of a Speex encoder app does fit in a 6713.  The 6713 has 8K of
> L1 and another 256K of memory 64K of which can be configured as L2 cache.
> (16,32,48, or 64K).  One level of TI's website seems to incorrectly indicate
> only 64K of L2.
> 
> I turned off MANUAL_ALLOC and have it allocating internal memory using
> calloc(). I did change the L2 cache to 2 way (32K) and adjusted the heap
> size to 12K to get it to fit.  I put a wavefile and the .cinit, .const up in
> the SDRAM.
> 
>                   name            origin    length      used    attr    fill
>          ----------------------  --------  ---------  --------  ----
> --------
>          IRAMB                   00000000   00000400  00000000  RWIX
>          IRAMP                   00000400   00028c00  00028218  RWIX
>          IRAM                    00029000   0000f000  00007fdc  RWIX
>          CACHE_L2                00038000   00008000  00000000  RWIX
>          SDRAM                   80000000   00800000  000181b7  RWIX
> 
> I'm currently using the simulator and the SDRAM doesn't seem to be a factor.
> I put some test vectors up into SDRAM and when I call DSPF_sp_dotprod() I
> get what I expect for cycles.  O(N/2)+25  The 32K L2 cache will help any
> SDRAM access.
> 
> I initially had a compiler option wrong which was miniminizing size instead
> of max speed, but I'm still at 44MIPS for a single channel.  
> 
> I saw the 10MFLOPS number in the documentation.  At first glance, a 300Mhz
> 67 looks under powered but a quick profile shows 50% of the cycles are
> concentrated in just a few functions.  The 67xx is designed to execute these
> functions and I figured I could get a factor of two out of these DSP
> functions.  That would bring me under the 9.3MFLOPS requirement. 
> 
> The DSP functions are performing as I expect:
> SP AutoCorrelation: (nx/2) * nr + (nr/2) * 5 + 10 - (nr * nr)/4 + nr
> SP FIR Filter:	  4*floor((nh-1)/2)+14)*(ceil(nr/4)) + 8
> SP Inner product:   nx/2 + 25
> 
> But unless the whole algorithm gets down near 10MIPS, I'm going to have to
> go to the 64xx fixed point.
> 
> Jerry J. Trantow
> Applied Signal Processing, Inc.
> jtrantow at ieee.org
> 
> 
> -----Original Message-----
> From: Jim Crichton [mailto:jim.crichton at comcast.net] 
> Sent: Thursday, January 19, 2006 10:33 AM
> To: Jerry Trantow
> Cc: speex-dev at xiph.org
> Subject: Re: [Speex-dev] TI 6xxx platform performance
> 
> Jerry,
> 
> I think that just removing the FIXED_POINT define should be sufficient, 
> though you mind want to turn off MANUAL_ALLOC, because I am not sure if the 
> memory usage is identical for the fixed point build, and the constants in 
> config.h are set for the fixed point build.  Are you testing on the 
> simulator, or on an eval board?  It does not look like the 6713 has enough 
> memory to hold Speex (64K vs. 1024K for the 6416), and your performance 
> could suffer badly running from external memory.
> 
> I would be very surprised if you can get below 9.3MIPs/channel for floating 
> point, since Jean-Marc has posted raw MIPs numbers somewhere around 10 MIPs 
> for the algorithm itself.  And that is discounting the memory issues.  With 
> the C6416 you can fit the code and data for 32 channels in internal memory.
> 
> If you want to post (or send me) your .pjt and .cmd files for the 6713 
> build, I can take a look at it in the simulator (I am using the 6415 now, 
> though I do not need as many channels).  But if you are experienced enough 
> in this area to be doing 6416 optimizations, I am probably not telling you 
> anything that you don't already know.  That is not a task for the faint of 
> heart.
> 
> Jim Crichton
> 
> 
> ----- Original Message ----- 
> From: "Jerry Trantow" <jtrantow at ieee.org>
> To: "'Jean-Marc Valin'" <jean-marc.valin at usherbrooke.ca>
> Cc: <speex-dev at xiph.org>
> Sent: Thursday, January 19, 2006 10:40 AM
> Subject: RE: [Speex-dev] TI 6xxx platform performance
> 
> 
> I started my project using the CodeComposerStudio speex_C64_test.pjt in
> speex 1.1.11.1.  To build using floating point, I created a new project with
> the same files and modified ti\config.h to #undef FIXED_POINT.  Is there a
> better way to configure a floating point processor?
> 
> I have a few TI specific optimizations that could go into the next release.
> What's the procedure for submitting code?
> 
> I've been working with this code for about a week now. I'm still trying to
> understand it all, but I'm particularly impressed by the float vs fixed
> flexibility of the code.
> 
> Jerry J. Trantow
> Applied Signal Processing, Inc.
> jtrantow at ieee.org
> 
> 
> -----Original Message-----
> From: Jean-Marc Valin [mailto:jean-marc.valin at usherbrooke.ca]
> Sent: Thursday, January 19, 2006 1:00 AM
> To: Jerry Trantow
> Cc: speex-dev at xiph.org
> Subject: Re: [Speex-dev] TI 6xxx platform performance
> 
> > To get a feel for the computational load, I am running 1 second (50
> frames)
> > of voice through the encoder.
> 
> You might want to use a bit more just so you don't see the
> initialization complexity at all.
> 
> > My profile of the 6416 indicates I'm at 27.4M cycles/channel.  I need to
> get
> > below 720Mhz/32 channels = 22.5M cycles per channel.  I did a little work
> on
> > inner_prod() and normalize16() and I'm confident I can get 32 channels by
> > optimizing 5 or 6 functions.  I expect these numbers to translate over the
> > DM642.
> 
> have you tried defining PRECISION16? That should reduce the computation
> cost.
> 
> > A lower cost option would be to use a floating point 6713.  I thought that
> a
> > 300Mhz floating point would come out even or ahead in an encoding
> > comparison.  Instead of the 300M/32=9.3M cycles per channel that I need, I
> > see 71.5M cycles per channel!!!
> 
> That's definitely strange. Normally, if your chip takes the same time to
> do a float op than it takes to do an int op, then the float version
> should be faster. That's because some of the float ops get replaced by
> several int ops.
> 
> > Does this make sense?
> > I'm generating floating point code, using the optimizer, etc...
> 
> Are you sure the compiler isn't using float emulation or something like
> that?
> 
> > Has anyone posted DM642, C64xx or C67xx benchmarks?
> 
> I'm not aware of any.
> 
> Jean-Marc
> 
> _______________________________________________
> Speex-dev mailing list
> Speex-dev at xiph.org
> http://lists.xiph.org/mailman/listinfo/speex-dev
> 
> 
> 
> _______________________________________________
> Speex-dev mailing list
> Speex-dev at xiph.org
> http://lists.xiph.org/mailman/listinfo/speex-dev
>