[opus] Opus performance on Cortex-M4
adi at hexapodia.org
Mon Nov 3 16:36:29 PST 2014
I'm considering implementing Opus as the codec for an embedded ARM-based
battery powered audio system. In the interest of battery life and board
footprint I'd like to specify the smallest CPU that can do the job.
In some quick testing on Cortex-A8 (a very different core, but at least
ISA compatible and hopefully fairly similar to M4 for things like cycle
counts and code size) I saw promising results -- about 30 MHz of A8 CPU
was sufficient to encode an audio stream using the 1.1.1-beta fixed
point codec at 48 kHz mono, complexity=5, bitrate=20kbit/sec.
Since the target SoCs tend to have an M3 or M4 running up to 100-150
MHz, and power consumption runs nearly linearly with clock speed, this
seemed to give us some headroom to run the rest of our application stack
and tune for battery life.
However now that we're doing a first implementation on M4, we're seeing
significantly higher cycle counts -- more in the range of 100 MHz of CPU
needed to encode with the same parameters. Additionally, compared to
1.0.3, the code size and data size of the Opus codec in 1.1 has
increased significantly (which makes it a challenge to fit in the on-SoC
SRAM of the M4).
Obviously we need to use the ARM ASM that landed in -beta, and we can
decrease the complexity to somewhat reduce the CPU utilization, but I'm
wondering if I'm missing any other low-hanging fruit in optimizing Opus
for this target CPU. I haven't even started to do code profiling or CPU
performance counter analysis.
Does anyone have examples of similar applications? What kinds of CPU
occupancy have other people seen on similar CPUs? Do we need to get
some NEON asm? Does anybody have spare cycles to take paid work in this
More information about the opus