[Speex-dev] new assembler port

Jim Crichton jim.crichton at comcast.net
Tue May 16 13:32:37 PDT 2006


>> I suggest that you start by looking at 8kbps, complexity 0.
>
> Actually, I strongly recommand *against* using complexity 0, unless you're
> really desperate for a few MIPS. The complexity reduction compared to 1 is
> small, but the loss in quality can be significant.

Oops.  My complexity setting is 1, not 0.  In the codebook search, at least,
the value is forced into the range 1 to 10.

Also, I just fixed a bug where, due to a type mismatch, my complexity was
set to 260 (limited to 10 by Speex internally).  That ran 5 to 6 times 
slower than complexity 1 on the TI C64x.

>> I (and others)
>> are running that on a TI C55xx DSP, and it runs a little under 30 MIPs
>> when
>> I last checked, with no assembly optimizations.  I have not tried to
>> profile
>> other rates, but I did run a test at 15kbps, complexity 3, and that was
>> 66
>> MIPs.
>
> That's from 1.1.12 or svn?

That was build 11234, and this was the run I sent to you back on 24 April,
when you were wondering if the DSP output matched the PC.  Did you ever have
a chance to try that comparison?

>> People ask about guidelines for assembly optimization, and if you do some
>> searching, you will find some tips from Jean-Marc on where to start.
>> Also,
>> if you look in the source tree for references to Blackfin or bfin, you
>> will
>> find an example port done by Jean-Marc.
>
> Yes, have a look at the _bfin.h, _sse.h and _arm.h files to get an idea of
> what's useful. Also, note that most calls to *_mem2() functions have been
> (and
> are being) converted to the _mem16() version.
>
>> You should make sure that you run the latest code from Subversion, which
>> has
>> some speed improvements from 1.1.12.
>
> Yes. I'm making big improvements for embedded systems, the last one being
> 5
> minutes ago.

I just tried the new build, and here are some numbers for comparison (not 
scientific, but generated from the first few frames of the same test file, 
8kbps, complexity 1).

TI C64x DSP (32-bit machine)
Build 11408 (SVN head):  34.0 MIPs peak
Build 11234 (22 April 06):  34.5 MIPs
Release 1.1.12:   36.5 MIPs

TI C55x DSP (16-bit machine)
Build 11408 (SVN head):  28.6 MIPs peak
Build 11398 (SVN 10 May): 29.0
Release 1.1.12:  38.3 MIPs
Release 1.1.8:  41.5

As you can see, the performance on at least this 16-bit machine has improved 
dramatically since 1.1.12.  The initialization routines are much faster as 
well, on both platforms.

- Jim




More information about the Speex-dev mailing list