[Speex-dev] TI 6xxx platform performance
Jean-Marc Valin
Jean-Marc.Valin at USherbrooke.ca
Thu Jan 19 15:03:51 PST 2006
Oh, and have you played with SPEEX_SET_COMPLEXITY. The default is 2, but
setting it to one will reduce the complexity at a small cost in quality.
Jean-Marc
Le jeudi 19 janvier 2006 à 15:49 -0600, Jerry Trantow a écrit :
> The majority of a Speex encoder app does fit in a 6713. The 6713 has 8K of
> L1 and another 256K of memory 64K of which can be configured as L2 cache.
> (16,32,48, or 64K). One level of TI's website seems to incorrectly indicate
> only 64K of L2.
>
> I turned off MANUAL_ALLOC and have it allocating internal memory using
> calloc(). I did change the L2 cache to 2 way (32K) and adjusted the heap
> size to 12K to get it to fit. I put a wavefile and the .cinit, .const up in
> the SDRAM.
>
> name origin length used attr fill
> ---------------------- -------- --------- -------- ----
> --------
> IRAMB 00000000 00000400 00000000 RWIX
> IRAMP 00000400 00028c00 00028218 RWIX
> IRAM 00029000 0000f000 00007fdc RWIX
> CACHE_L2 00038000 00008000 00000000 RWIX
> SDRAM 80000000 00800000 000181b7 RWIX
>
> I'm currently using the simulator and the SDRAM doesn't seem to be a factor.
> I put some test vectors up into SDRAM and when I call DSPF_sp_dotprod() I
> get what I expect for cycles. O(N/2)+25 The 32K L2 cache will help any
> SDRAM access.
>
> I initially had a compiler option wrong which was miniminizing size instead
> of max speed, but I'm still at 44MIPS for a single channel.
>
> I saw the 10MFLOPS number in the documentation. At first glance, a 300Mhz
> 67 looks under powered but a quick profile shows 50% of the cycles are
> concentrated in just a few functions. The 67xx is designed to execute these
> functions and I figured I could get a factor of two out of these DSP
> functions. That would bring me under the 9.3MFLOPS requirement.
>
> The DSP functions are performing as I expect:
> SP AutoCorrelation: (nx/2) * nr + (nr/2) * 5 + 10 - (nr * nr)/4 + nr
> SP FIR Filter: 4*floor((nh-1)/2)+14)*(ceil(nr/4)) + 8
> SP Inner product: nx/2 + 25
>
> But unless the whole algorithm gets down near 10MIPS, I'm going to have to
> go to the 64xx fixed point.
>
> Jerry J. Trantow
> Applied Signal Processing, Inc.
> jtrantow at ieee.org
>
>
> -----Original Message-----
> From: Jim Crichton [mailto:jim.crichton at comcast.net]
> Sent: Thursday, January 19, 2006 10:33 AM
> To: Jerry Trantow
> Cc: speex-dev at xiph.org
> Subject: Re: [Speex-dev] TI 6xxx platform performance
>
> Jerry,
>
> I think that just removing the FIXED_POINT define should be sufficient,
> though you mind want to turn off MANUAL_ALLOC, because I am not sure if the
> memory usage is identical for the fixed point build, and the constants in
> config.h are set for the fixed point build. Are you testing on the
> simulator, or on an eval board? It does not look like the 6713 has enough
> memory to hold Speex (64K vs. 1024K for the 6416), and your performance
> could suffer badly running from external memory.
>
> I would be very surprised if you can get below 9.3MIPs/channel for floating
> point, since Jean-Marc has posted raw MIPs numbers somewhere around 10 MIPs
> for the algorithm itself. And that is discounting the memory issues. With
> the C6416 you can fit the code and data for 32 channels in internal memory.
>
> If you want to post (or send me) your .pjt and .cmd files for the 6713
> build, I can take a look at it in the simulator (I am using the 6415 now,
> though I do not need as many channels). But if you are experienced enough
> in this area to be doing 6416 optimizations, I am probably not telling you
> anything that you don't already know. That is not a task for the faint of
> heart.
>
> Jim Crichton
>
>
> ----- Original Message -----
> From: "Jerry Trantow" <jtrantow at ieee.org>
> To: "'Jean-Marc Valin'" <jean-marc.valin at usherbrooke.ca>
> Cc: <speex-dev at xiph.org>
> Sent: Thursday, January 19, 2006 10:40 AM
> Subject: RE: [Speex-dev] TI 6xxx platform performance
>
>
> I started my project using the CodeComposerStudio speex_C64_test.pjt in
> speex 1.1.11.1. To build using floating point, I created a new project with
> the same files and modified ti\config.h to #undef FIXED_POINT. Is there a
> better way to configure a floating point processor?
>
> I have a few TI specific optimizations that could go into the next release.
> What's the procedure for submitting code?
>
> I've been working with this code for about a week now. I'm still trying to
> understand it all, but I'm particularly impressed by the float vs fixed
> flexibility of the code.
>
> Jerry J. Trantow
> Applied Signal Processing, Inc.
> jtrantow at ieee.org
>
>
> -----Original Message-----
> From: Jean-Marc Valin [mailto:jean-marc.valin at usherbrooke.ca]
> Sent: Thursday, January 19, 2006 1:00 AM
> To: Jerry Trantow
> Cc: speex-dev at xiph.org
> Subject: Re: [Speex-dev] TI 6xxx platform performance
>
> > To get a feel for the computational load, I am running 1 second (50
> frames)
> > of voice through the encoder.
>
> You might want to use a bit more just so you don't see the
> initialization complexity at all.
>
> > My profile of the 6416 indicates I'm at 27.4M cycles/channel. I need to
> get
> > below 720Mhz/32 channels = 22.5M cycles per channel. I did a little work
> on
> > inner_prod() and normalize16() and I'm confident I can get 32 channels by
> > optimizing 5 or 6 functions. I expect these numbers to translate over the
> > DM642.
>
> have you tried defining PRECISION16? That should reduce the computation
> cost.
>
> > A lower cost option would be to use a floating point 6713. I thought that
> a
> > 300Mhz floating point would come out even or ahead in an encoding
> > comparison. Instead of the 300M/32=9.3M cycles per channel that I need, I
> > see 71.5M cycles per channel!!!
>
> That's definitely strange. Normally, if your chip takes the same time to
> do a float op than it takes to do an int op, then the float version
> should be faster. That's because some of the float ops get replaced by
> several int ops.
>
> > Does this make sense?
> > I'm generating floating point code, using the optimizer, etc...
>
> Are you sure the compiler isn't using float emulation or something like
> that?
>
> > Has anyone posted DM642, C64xx or C67xx benchmarks?
>
> I'm not aware of any.
>
> Jean-Marc
>
> _______________________________________________
> Speex-dev mailing list
> Speex-dev at xiph.org
> http://lists.xiph.org/mailman/listinfo/speex-dev
>
>
>
> _______________________________________________
> Speex-dev mailing list
> Speex-dev at xiph.org
> http://lists.xiph.org/mailman/listinfo/speex-dev
>
More information about the Speex-dev
mailing list