[Speex-dev] Blackfin inline assembler and VisualDSP++ toolchain

Thu Jun 21 04:48:53 PDT 2007

From: Jim Crichton [mailto:jim.crichton at comcast.net]
Sent: Tuesday, June 19, 2007 10:47 PM
>
>For TI DSPs, I used a private memory array rather than the C stack, and a 
>debug patch in stack_alloc.h to measure the scratch usage:
>
>#if 1
>extern char *spxGlobalScratchFree;
>#define ALLOC(var, size, type) (var = PUSH(stack, size, type), 
>(spxGlobalScratchFree)=((stack)>(spxGlobalScratchFree))?(stack):(spxGlobalScratchFree))
>#else
>#define ALLOC(var, size, type) var = PUSH(stack, size, type)
>#endif
>
>I Initialized the global scratch pointer to the beginning of the scratch 
>area in the encoder init, and the debug macro keeps track of the max usage. 
>It may be too late in your work for this to be of any help.
>

I measured one mode at one complexity setting (15kbps, complexity=1). If you could publish
data for other modes/complexities it would be appreciated.

>>>> On the code size things are less rosy.
>>>> The wideband indeed goes away with DISABLE_WIDEBAND but that's about 
>>>> all.
>>>> Due to extensive use of function pointers very little unused stuff 
>>>> beyond wideband
>>>> goes away when unused.
>>>
>>>Unless you NULL those pointers you don't need. Also, if you only use one
>>>rate, there are tables you can get rid of as well. All the tables
>>>represent about 10kB of ROM size, but you can probably reduce that to
>>>2-3 kB if you only use a single narrowband mode.
>>
>> Nullifying the pointers means that I don't treat the code as a black box. 
>> Which means
>> that if I upgrade to the next version of the library I'd have to reapply 
>> the patches.
>
>For those of us working on very memory constrained platforms, I don't think 
>that it will ever be a black box, because that would require having ENBABLE 
>defines for every rate and feature, so one could build up just what is 
>needed.  That would be really messy.
>

I'd guess Jean-Marc would never agree to ENABLE defines because it would complicate the life
for non-memory-constrained majority. But we could convince him to add DISABLE defines.
I don't agree that it has to be messy. There are many heavily configurable open-source projects.
Look at eCOS as just one example. By comparison, disabling individual modes in speex would be order of
magnitude simpler.

>You did not respond to the point about single data rate. If you are doing 
>this, then you can get rid of most of the tables if you fix up the 
>references in modes.c.  It would be nice to have a README.code-reduction 
>file that collected some of the advice that hits the list from time to time.

Yes, the final application very likely to use a single data rate. And yes, 
readme.code-reduction is an excellent idea.

>>>> For starter, I would like DISABLE_VBR analogous to DISABLE_WIDEBAND.
>>>> After that, it's probably possible to put vocoder under conditional 
>>>> compilation
>>>> the stuff that is used only in vocoder modes. It seems that modes 3 to 7 
>>>> are too
>>>> similar to each other to save significant amount of code by eliminating 
>>>> some of them,
>>>> but I have a feeling that generic mechanism for picking only those modes 
>>>> needed (either
>>>> through conditional compilation or may be even with configuration perl 
>>>> script) would be
>>>> simple than specific DISABLE_VOCODER.
>>>
>>>The problem is that there are *lots* of things like that and having an
>>>option for everything would make the code a bit ugly. But they aren't
>>>that hard to debug. If you don't know if a function is useful, remove it
>>>and see what happens. If it succeeds in encoding one file, it will work
>>>all the time.
>>
>> VBR is by far the biggest thing after WIDEBAND that the users are likely 
>> to never need or
>> never want. Ant take it off efficiently requires the widest knowledge of 
>> internal functioning
>> of the library. I think, DISABLE_VBR is a good candidate for official 
>> release.
>
>I removed vbr.c and ifdefed the references in nb_celp.c (in 8 or so places). 
>This is not too messy, and I could send a patch for this if Jean-Marc is 
>agreeable.

I suggest to send a patch first. Jean-Marc always has the opportunity to reject.

>>>Plus 16k 24-bit words is already 48 kB and I'm sure Speex can fit into 
>>>smaller than that.
>>
>> First, I am not sure that board had full 16K words. I said 16K because 
>> that's the maximal size
>> allowed by ADSP-2111 architecture.
>> Second, code density of Blackfin family is far superior over ADI 21xx.
>> Third, I believe you that 48 KB speex on Blackfin is possible, but right 
>> now my code is bigger.
>
>With VBR and all modes but one stripped, My text+const size for the TI C55 
>is about 48 KB for a standalone build.  It was about 58 KB before.  The 
>remaining source files are:
>
>libspeex\bits.c
>libspeex\cb_search.c
>libspeex\exc_10_32_table.c
>libspeex\filters.c
>libspeex\gain_table_lbr.c
>libspeex\lpc.c
>libspeex\lsp.c
>libspeex\lsp_tables_nb.c
>libspeex\ltp.c
>libspeex\math_approx.c
>libspeex\misc.c
>libspeex\modes.c
>libspeex\nb_celp.c
>libspeex\quant_lsp.c
>libspeex\speex.c
>libspeex\speex_callbacks.c
>libspeex\vq.c
>libspeex\window.c
>ti\testenc-TI-C5x.c
>
>My platform has 256KB of internal RAM, so this was fine for me.  It does 
>suggest that it might be very hard for you to squeeze this in.  Maybe some 
>Blackfin users can chime in with their memory/MIPs results.
>

Yes, in theory C55 and Blackfin have comparable code density. Which suggests that 32KB code is 
out of reach.

>>>>> IIRC, gcc alone (no asm) was using something in the order of 100 MIPS
>>>>> (back when it couldn't do hardware loops, MACs, cond. moves, ...), so 
>>>>> as
>>>>> you can see, there's a fair bit of difference. So yes, with assembly
>>>>> working, VDSP++ should be able to achieve better than 20 MIPS.
>>>>>
>>>>> Jean-Marc
>>>>
>>>> Not sure we are talking about the same mode.
>>>
>>>>This was with the 15 kbps mode used at complexity 1.
>>>>
>>> Jean-Marc
>>
>> Yes, that's the mode that I measured, with no VBR. Does 100 MIPS figure 
>> reflect the situation before
>> or after David Rowe's improvements?
>
>I see around 26 MIPs for a TI C55x DSP for Quality 3 (8kbps), complexity 1, 
>and about 33 MIPs on a TI C64xx, with no assembly optimizations, using TI's 
>build tools.  That is consistent with your 15kbps result.
>
>- Jim 

Pretty embarrassing for result for VLIW ): Yet another case of the difference between the practice and the theory.
For 8kbps, complexity=1 I measured 25 MIPs for encoder + 4 MIPs for decoder.