[Speex-dev] Blackfin inline assembler and VisualDSP++ toolchain

Tue Jun 19 09:21:19 PDT 2007

-----Original Message-----
From: Jean-Marc Valin [mailto:jean-marc.valin at usherbrooke.ca]
Sent: Thursday, June 14, 2007 11:17 PM
To: Michael Shatz
Cc: speex-dev at xiph.org
Subject: Re: [Speex-dev] Blackfin inline assembler and VisualDSP++
toolchain

Michael Shatz a écrit :
>>> Actually, you're the first I know using the VisualDSP++ toolchain
>>> :-)
>> 
>> I guess that's because speex has pretty big memory footprint. 
>
>Actually, you'll find that the data footprint in the lastest versions is
>pretty small. There's a bit more code/tables, but you'll find that many
>can go away if you're not actually using them.

Yes, data footprint in the new version is quite manageable. Still I would 
wish better documentation for speex_alloc_scratch(). It took me time to 
figure out that in single-threaded environment I could give the same scratch
area to multiple encoders end decoders. It would be also very useful to document
the size of the scratch area as the function of mode. By the process of trial and
error I found out that in my mode scratch never exceeds 2700 bytes but finding this
data in documentation would be so simpler and more reliable.

On the code size things are less rosy. 
The wideband indeed goes away with DISABLE_WIDEBAND but that's about all.
Due to extensive use of function pointers very little unused stuff beyond wideband 
goes away when unused.
For starter, I would like DISABLE_VBR analogous to DISABLE_WIDEBAND.
After that, it's probably possible to put vocoder under conditional compilation 
the stuff that is used only in vocoder modes. It seems that modes 3 to 7 are too
similar to each other to save significant amount of code by eliminating some of them, 
but I have a feeling that generic mechanism for picking only those modes needed (either 
through conditional compilation or may be even with configuration perl script) would be
simple than specific DISABLE_VOCODER.
Another potential saving could be achieved by replacing speex_warning, speex_notification
and speex_error with user-modifiable defines. The existing DISABLE_WARNING/
OVERRIDE_SPEEX_WARNING method is not efficient in reducing the code footprint because the
majority of the overhead happens in the points of invocation of the speex_warning rather than
in the function itself.

With all my suggestion applied there is an opportunity that minimized speex would fit in on-chip
code memory of BF532 (48KB). However the original goal of fitting in BF531 (32KB of on chip code 
memory) seem impossible even then.

>> So
>> developers that integrate speex tend to have plenty of RAM and once
>> one has plenty of RAM he could install biggish OS. And between
>> biggish OSes for Blackfin the most popular choice is uCLinux. And
>> ucLinux works best with gnu tools. Something like that. On the other
>> hand, developers that use Blakfin in a manner similar to traditional
>> 16-bit DSP usage model, i.e. without external RAM or with relatively
>> small internal SRAM normally use no OS at all (like me) or ADI's VDK.
>> These people naturally prefer ADI toolchain because it gives you good
>> visibility of what's going on within a small "bare metal" target. But
>> such developers a less likely to integrate speex because it simply
>> doesn't fit.
>
>What do they use? I don't think Speex is really much more expensive than
>other codecs when you compare apples to apples (e.g. if you compare with
>g.729, then first disable anything that isn't used by the 8 kbps mode).

Mostly GSM and proprietary codecs. Or G.726. I am starting to feel that I, too,
will end up with G.726.
Many years ago I worked on project in which proprietary codec was compressing to
4400 bps with decent speech quality all at code footprint of 16K 24-bit words and
about 8-9 ADSP-2111 MIPS. I wasn't involved in speech processing so by now I don't 
remember which algorithm they used. IIRC, not CELP.

<snip>

>IIRC, gcc alone (no asm) was using something in the order of 100 MIPS
>(back when it couldn't do hardware loops, MACs, cond. moves, ...), so as
>you can see, there's a fair bit of difference. So yes, with assembly
>working, VDSP++ should be able to achieve better than 20 MIPS.
>
>	Jean-Marc

Not sure we are talking about the same mode.
If I would find a time I'd try to run gcc-compiled version on my board. 
But the chance that I would find a time for that is pretty slim.

Regards,
Michael Shatz