[CELT-dev] CELT grabbing 100KB of memory right off the top

Mon Apr 18 17:35:13 PDT 2011

On 11-04-18 07:23 PM, Andrew Lentvorski wrote:
> Is there a good reason why I'm not getting a factor of 2 improvement
> from using 24KHz instead of 48KHz?

Most likely because CELT actually up-samples to 48 kHz internally. Even 
if that wasn't the case, you wouldn't save much since most of the 
complexity is on frequencies below 12 kHz.

> How about the drop from 128kbps to
> 64kbps?  Framesize?

Lowering the bit-rate would reduce complexity. Increasing the frame size 
*may* help as well. Lowering the "complexity setting" 
CELT_SET_COMPLEXITY should definitely help at the expense of lowering 
quality. I suggest trying complexity 4, which disables the pitch search 
for the prefilter.

> Like what?  When I asked for suggestions, I wasn't joking.  I'd be happy
> to provide a contribution.

Well, the pitch search for the prefilter (if you want to keep the 
quality it provides) is an obvious one, that can take close to 40% off 
the encoder CPU.

> I already did a quick profile of the fixed point encoder and it wasn't
> doing anything obviously stupid.  There were 4 hotspots at 15%, 8%, 7%,
> and 5% respectively.  The 15% involved ilog.

ilog is an easy one. Most platforms have a hardware instruction to 
compute that, so just using it should make that 15% go away.

> A couple of things already
> had estimations and Newton iterations (that would have been one of my
> first choices to try).  There's certainly no obvious nail sticking up
> that I can see.  Even if I managed to reduce the ilog stuff to 0, I
> don't get enough improvement to make a difference (Amdahl's Law and all
> that).
>
> While I probably don't need a full 50%, I would need to get things under
> about 40MIPS (so that the processor has about 50% idle time to do other
> things).  That's probably 25-30%.  Given that I don't see an obvious
> thing to optimize, it's probably not worth continuing to pour time at
> this unless somebody has some good suggestions (like locking the FFT to
> a specific size or something like that).

Well, if you're willing to spend time on that an optimized MDCT should 
certainly be faster than the current implementation, which is based on 
an FFT. Or if you have less time, you could simply replace kiss-fft by 
an FFT that's been optimized for your architecture.

There's probably other places that can be made faster, but the ones I 
listed above are the most obvious ones.

Cheers,

	Jean-Marc