[Speex-dev] Blackfin inline assembler and VisualDSP++ toolchain

Tue Jun 19 09:38:17 PDT 2007

> Yes, data footprint in the new version is quite manageable. Still I would 
> wish better documentation for speex_alloc_scratch(). 

I'll be waiting for your patch :-)

> It took me time to 
> figure out that in single-threaded environment I could give the same scratch
> area to multiple encoders end decoders. It would be also very useful to document
> the size of the scratch area as the function of mode. By the process of trial and
> error I found out that in my mode scratch never exceeds 2700 bytes but finding this
> data in documentation would be so simpler and more reliable.

Unfortunately not possible. The amount of stack (scratch) space required
depends on the bit-rate you select, the complexity value, whether you
compile as float or fixed-point, ... But if your compiler is sane (read
C99-compliant), you don't even need that. All you need is to define
VAR_ARRAYS and all the temp arrays will be allocated as C99
variable-size arrays (no memory will be allocated for explicit scratch
space). The configure script actually detects this by default. Even
without a C99 compiler, you can still use alloca (by defining
USE_ALLOCA), which is still better than the scratch space.

> On the code size things are less rosy. 
> The wideband indeed goes away with DISABLE_WIDEBAND but that's about all.
> Due to extensive use of function pointers very little unused stuff beyond wideband 
> goes away when unused.

Unless you NULL those pointers you don't need. Also, if you only use one
rate, there are tables you can get rid of as well. All the tables
represent about 10kB of ROM size, but you can probably reduce that to
2-3 kB if you only use a single narrowband mode.

> For starter, I would like DISABLE_VBR analogous to DISABLE_WIDEBAND.
> After that, it's probably possible to put vocoder under conditional compilation 
> the stuff that is used only in vocoder modes. It seems that modes 3 to 7 are too
> similar to each other to save significant amount of code by eliminating some of them, 
> but I have a feeling that generic mechanism for picking only those modes needed (either 
> through conditional compilation or may be even with configuration perl script) would be
> simple than specific DISABLE_VOCODER.

The problem is that there are *lots* of things like that and having an
option for everything would make the code a bit ugly. But they aren't
that hard to debug. If you don't know if a function is useful, remove it
and see what happens. If it succeeds in encoding one file, it will work
all the time.

> Another potential saving could be achieved by replacing speex_warning, speex_notification
> and speex_error with user-modifiable defines. The existing DISABLE_WARNING/
> OVERRIDE_SPEEX_WARNING method is not efficient in reducing the code footprint because the
> majority of the overhead happens in the points of invocation of the speex_warning rather than
> in the function itself.

How about:
#define OVERRIDE_SPEEX_WARNING
#define speex_warning(x) {}
in user_misc.h? That should do the trick.

> With all my suggestion applied there is an opportunity that minimized speex would fit in on-chip
> code memory of BF532 (48KB). However the original goal of fitting in BF531 (32KB of on chip code 
> memory) seem impossible even then.

32 kB for Speex appears quite possible to me. Especially considering
you're only interested in the decoder, right (or was it the encoder)?

> Mostly GSM and proprietary codecs. Or G.726. I am starting to feel that I, too,
> will end up with G.726.

I heard there are very small and very fast G.711 encoders too :-)
Seriously, you need to compare apples to apples.

> Many years ago I worked on project in which proprietary codec was compressing to
> 4400 bps with decent speech quality all at code footprint of 16K 24-bit words and
> about 8-9 ADSP-2111 MIPS. I wasn't involved in speech processing so by now I don't 
> remember which algorithm they used. IIRC, not CELP.

4.4 kbps is almost certainly some variant of CELP. Plus 16k 24-bit words
is already 48 kB and I'm sure Speex can fit into smaller than that.

> <snip>
> 
>> IIRC, gcc alone (no asm) was using something in the order of 100 MIPS
>> (back when it couldn't do hardware loops, MACs, cond. moves, ...), so as
>> you can see, there's a fair bit of difference. So yes, with assembly
>> working, VDSP++ should be able to achieve better than 20 MIPS.
>>
>> 	Jean-Marc
> 
> Not sure we are talking about the same mode.

This was with the 15 kbps mode used at complexity 1.

	Jean-Marc