[opus] Ask for suggestions about optimizing opus on STM32F407
Forrest Zhang
forrest at 263.net
Mon Jan 15 04:31:27 UTC 2018
Hello Thomas and Amit,
Thanks for your notice and the detailed decode performance report.
I describe the details of my encode/decode test on STM32F407ZG.
A. opus version: latest 1.2.1 (TI: opus 1.1.2)
B. KEIL 5.23 (TI: ARM compiler tool chain 5.2.7)
C. setup the encoder as the below (fs is the sampling frequency)
enc = opus_encoder_create(fs, chans, OPUS_APPLICATION_AUDIO, &opus_err);
opus_encoder_ctl(enc, OPUS_SET_BITRATE(fs * 2));
opus_encoder_ctl(enc, OPUS_SET_BANDWIDTH(OPUS_AUTO));
opus_encoder_ctl(enc, OPUS_SET_VBR(1));
opus_encoder_ctl(enc, OPUS_SET_VBR_CONSTRAINT(0));
opus_encoder_ctl(enc, OPUS_SET_COMPLEXITY(0));
opus_encoder_ctl(enc, OPUS_SET_INBAND_FEC(0));
opus_encoder_ctl(enc, OPUS_SET_FORCE_CHANNELS(OPUS_AUTO));
opus_encoder_ctl(enc, OPUS_SET_DTX(0));
opus_encoder_ctl(enc, OPUS_SET_PACKET_LOSS_PERC(0));
opus_encoder_ctl(enc, OPUS_GET_LOOKAHEAD(&lookahead));
opus_encoder_ctl(enc, OPUS_SET_LSB_DEPTH(16));
opus_encoder_ctl(enc,
OPUS_SET_EXPERT_FRAME_DURATION(OPUS_FRAMESIZE_20_MS));
/* CELT is faster than SILK? */
opus_encoder_ctl(enc, OPUS_SET_FORCE_MODE(MODE_CELT_ONLY));
D. generate 20ms PCM sample data (Cosine wave with amplitude 0x6000 and
frequency about 1150 Hz)
E. encode the PCM data and decode it immediately, count the CPU usages.
F. repeat until reach the duration time (1000ms or 10000ms)
G. The summary of STM32F407 Test Result as below:
Mode Sample Chan Freq. Duration Encode + Decode = Total
FLOAT 48kHz 2 1150 1000ms 2735ms + 3367ms = 6102ms
FIXED 48kHz 2 1150 1000ms 2112ms + 1543ms = 3698ms
FIXED 48kHz 1 1150 1000ms 1312ms + 911ms = 2249ms
FIXED 24kHz 1 1150 1000ms 1067ms + 783ms = 1872ms
FIXED 16kHz 1 1150 1000ms 922ms + 711ms = 1651ms
FIXED 12kHz 1 1150 1000ms 1296ms + 193ms = 1507ms
FIXED 8kHz 2 1150 1000ms 1014ms + 147ms = 1181ms
FIXED 8kHz 1 1150 1000ms 1086ms + 135ms = 1241ms
FIXED 8kHz 1 1150 10000ms 11206ms + 1318ms = 12544ms
H. Build Options
FLOAT: OPUS_BUILD,USE_ALLOCA,CUSTOM_SUPPORT
FIXED: OPUS_BUILD,USE_ALLOCA,CUSTOM_SUPPORT,FIXED_POINT,DISABLE_FLOAT_API
Note: the target bit rate is twice of the sampling frequency. That's to say,
the bit rate will be 96kbps, if the sampling frequency is 48kHz.
The CPU usage is about 91% (911ms/1000ms), when decode 48KHz/mono/96bps. but
encode requires more CPU (132%, 1312/1000ms).
I will try lower bit rate and update the result later.
Sincerely
Forrest
On Sunday, January 14, 2018 9:05:44 AM CST Thomas Böhm wrote:
> Hello Forrest,
> some years ago i developed a network media player based on a
> STM32F407ZGT6 (168MHz clock) and opus 1.1.
> I used just the fixed point code and did no particular optimization on
> the opus code itself because the performance was already quite good, see
> figures below.
> The figures are for real time playback with different frame sizes and
> various constant bit rates.
> I didn't play that much with encoding, but I'm convinced that the 32F407
> is powerful enough to do the job, if you use all its capabilities.
>
> Most important is to use the hardware features of the processor like the
> DMA controller or the CRC calculation unit, if you deal with ogg, to
> unload the CPU.
>
> SILK narrow band, a) mono b) stereo:
>
> SILK medium band, a) mono b) stereo:
>
> Hybride wide band, a) mono b) stereo:
>
> Hybride super wide band, a) mono b) stereo:
>
> Hybride full band, a) mono b) stereo:
>
>
> CELT full band mono:
>
> CELT full band stereo:
>
> Regards,
> Thomas
>
> Am 06.01.2018 um 10:02 schrieb forrest:
> > Dear Developers,
> >
> >
> > I make a opus 1.2.1 codec build for STM32F407(fixed-point and disable
> > float APIs).
> >
> > it seems too slow for the VOIP application.
> >
> >
> > Case 1:
> >
> > 48KHz Sampling rate, Stereo, VBR, frame size: 20ms, Bit-rates: 96kbps
> >
> > Encode cost: 2.11x real time
> >
> > Decode cost: 1.54x real time
> >
> > Encode + Decode: 3.65x
> >
> >
> > Case 2:
> >
> > 8KHz Sampling rate, Mono, VBR, frame size: 20ms, Bit-rates: 16kbps
> >
> > Encode cost: 1.08x real time
> >
> > Decode cost: 0.14x real time
> >
> > Encode + Decode: 1.24x
> >
> >
> > Are there any available optimizations or suggestions for Cortex-M4?
> >
> >
> > As I knonw, TI TM4C129x is based on Cortex-M4 too:
> >
> > http://www.ti.com/tool/TIDM-TM4C129POEAUDIO
> >
> >
> > The performance of opus on it is good enough for mono 48KHz sampling rate.
> >
> > CPU usage is only about 70% of 120MHz when encode/decode at same time.
> >
> >
> > Sincerely
> >
> > Forrest
> >
> >
> >
> >
> > _______________________________________________
> > opus mailing list
> > opus at xiph.org
> > http://lists.xiph.org/mailman/listinfo/opus
More information about the opus
mailing list