[Speex-dev] Speex on TI C6x, Problem with TI C5x Patch
Jim Crichton
jim.crichton at comcast.net
Wed May 25 12:56:02 PDT 2005
>> There is a bit of work remaining to get the memory usage down for a
>> multichannel application. There have been some good posts over the
>> last couple of months about reducing memory usage.
>
> I think 1.1.8 incorporates all memory reductions proposed. Let me know
> otherwise.
For the persistent storage, the only change that I have made is to
MAX_CHARS_PER_FRAME, which is set to 2000 in bits.c. I changed bits.c to
set this value only if it was not already defined, and then put my own, much
smaller value in config.h.
For the scratch stack, I replace the fixed values in nb_encoder_init and
nb_decoder_init with constants that I defined in config.h. Jamey Hicks
original C5x patch had some test code in stack_alloc.h to detect working
stack overflow. Maybe something similar could be done to measure the peak
stack usage, enabled by a debug switch. Then, for a space critical
application, it would be easy to measure the stack requirement for a given
operating mode, and set the size (manually) accordingly.
>> Also, to nominally comply with the TI XDAIS algorithm standard, it is
>> necessary to extract all of the memory allocation from the code,
>> organize it into blocks, and provide a table to the application host
>> with the size and scratch/persistent nature of each block. The host
>> then does the memory allocating, and provides the pointers back to the
>> application.
>
> I'm not familiar with XDAIS, but I would think you could just overload
> the speex_alloc() and speex_free() functions, right?
According to this standard, an allocate call is made to an algorithm, and
the algorithm fills in a table of required blocks (size, alignment, and
scratch/persistent type). The system allocates these blocks, and calls the
algorithm init function, with the same memory table, now including the base
addresses. Now, these addresses have to get into Speex somehow. Since I
did not want to change the API, I have resorted to the kludge of declaring
global variables, which I initialize based on the allocated memory blocks.
My alloc routines then look at the global variables, similar to the way
calloc works.
This does not solve the problem of distinguishing persistent and scratch
storage. To do this, I added a speex_alloc_scratch routine, which uses a
different memory block than speex_alloc. This does force a change to
nb_encoder_init, etc. At the moment, the code looks like this:
#if defined(VAR_ARRAYS) || defined (USE_ALLOCA)
st = (EncState*)speex_alloc(sizeof(EncState));
if (!st)
return NULL;
st->stack = NULL;
#elif defined(SCRATCH_ALLOC)
st = (EncState*)speex_alloc(sizeof(EncState));
if (!st)
return NULL;
st->stack = (char*)speex_alloc_scratch(SPEEXENC_SCRATCH_STACK_SIZE);
#else
st = (EncState*)speex_alloc(sizeof(EncState)+8000*sizeof(spx_sig_t));
if (!st)
return NULL;
st->stack = ((char*)st) + sizeof(EncState);
#endif
Note that I also moved the "if (!st)" check to before st-stack is set, since
a write to a bad location would occur otherwise.
>> Question 1: Is there anything wrong with using a 32-bit float for
>> spx_word64_t (other than MIPs)? This type is used only in two places
>> in ltp.c.
>
> No problem replacing with a float. The reason for the 64 bits is not the
> precision but only the range. A 40-bit accumulator would work too.
> Eventually, this could probably made to fit in a 32-bit int, but I
> haven't done that yet.
The C55x uses a 40-bit long long (as Stuart Cording pointed out), so this
should be fine here.
>> 3. And, of course, the internal stack memory allocations in
>> nb_encoder_int and nb_decoder_init had to be cut down to fit within
>> the available data memory space. It would be useful to parameterize
>> the working stack allocation size for those folks who cannot use the
>> new VAR_ARRAYS and USE_ALLOCA stuff.
>
> Would a compile-time option be OK (so I don't need to change the API)?
> If so, I'll put that on the TODO list.
I am using a compile option, as shown above.
>> With this change, the codec ran, but the encoded data is garbage.
>> Eventually I realized that because the char size on the C5x is 16
>> bits, the fread and fwrite routines are using only the least
>> significant 8 bits of each word. A little packing and unpacking
>> later, the encoder/decoder loop was producing intelligible sound.
>> However, there are some some anomalies. Using the sample file
>> male.wav, the output has a positive step at 0.1 sec (rapid ramp from 0
>> to ~20000 sample value, with decay back to zero by time 0.112 sec),
>> another positive step at 2.940 sec (amplitude about 3000, decaying in
>> 12 ms again), and a rail-to-rail impulse at 4.600 sec (also decaying
>> within a few msec). This is a simulator, so there are no "real world"
>> effects at play. The C6x simulation does not show the artifacts. The
>> encoded bits are the same for the first frame, but then they diverge.
>
> That's odd, definitely worth investigating.
Stuart Cordings change to replace the math macros with inline functions
cures the problem. I will continue to look at this.
- Jim Crichton
More information about the Speex-dev
mailing list