[opus] Opus performance on Cortex-M4
jmvalin at jmvalin.ca
Mon Nov 3 17:32:30 PST 2014
On 03/11/14 07:36 PM, Andy Isaacson wrote:
> In some quick testing on Cortex-A8 (a very different core, but at least
> ISA compatible and hopefully fairly similar to M4 for things like cycle
> counts and code size) I saw promising results -- about 30 MHz of A8 CPU
> was sufficient to encode an audio stream using the 1.1.1-beta fixed
> point codec at 48 kHz mono, complexity=5, bitrate=20kbit/sec.
First, I think the big difference between the M4 and the A8 is that A8
has Neon, which Opus is able to use.
> However now that we're doing a first implementation on M4, we're seeing
> significantly higher cycle counts -- more in the range of 100 MHz of CPU
> needed to encode with the same parameters. Additionally, compared to
> 1.0.3, the code size and data size of the Opus codec in 1.1 has
> increased significantly (which makes it a challenge to fit in the on-SoC
> SRAM of the M4).
I suspect most of the size increase you're seeing is from the new code
in src/analysis.c which you do not need. In fact, if you're operating at
20 kb/s for speech, then you can entirely remove the CELT encoder from
your build. You still need the decoder because there's no guarantee what
the remote end will send you.
> Obviously we need to use the ARM ASM that landed in -beta, and we can
> decrease the complexity to somewhat reduce the CPU utilization, but I'm
> wondering if I'm missing any other low-hanging fruit in optimizing Opus
> for this target CPU. I haven't even started to do code profiling or CPU
> performance counter analysis.
There's a few things to check. First, make sure that
OPUS_ARM_INLINE_EDSP (enabling DSP extensions) is defined in your
config.h. Also, check for OPUS_ARM_ASM and OPUS_HAVE_RTCD. That means
all the asm is enabled. At that point, the best is to run the profiles
to see where the CPU time is spent.
More information about the opus