[opus] Ask for suggestions about optimizing opus on STM32F407

Forrest Zhang forrest at 263.net
Mon Jan 15 04:31:27 UTC 2018


Hello Thomas and Amit,

Thanks for your notice and the detailed decode performance report. 

I describe the details of my encode/decode test on STM32F407ZG.

A. opus version: latest 1.2.1 (TI: opus 1.1.2)
B. KEIL 5.23 (TI: ARM compiler tool chain 5.2.7)
C. setup the encoder as the below (fs is the sampling frequency)
	enc = opus_encoder_create(fs, chans, OPUS_APPLICATION_AUDIO, &opus_err);
	opus_encoder_ctl(enc, OPUS_SET_BITRATE(fs * 2));
	opus_encoder_ctl(enc, OPUS_SET_BANDWIDTH(OPUS_AUTO));
	opus_encoder_ctl(enc, OPUS_SET_VBR(1));
	opus_encoder_ctl(enc, OPUS_SET_VBR_CONSTRAINT(0));
	opus_encoder_ctl(enc, OPUS_SET_COMPLEXITY(0));
	opus_encoder_ctl(enc, OPUS_SET_INBAND_FEC(0));
	opus_encoder_ctl(enc, OPUS_SET_FORCE_CHANNELS(OPUS_AUTO));
	opus_encoder_ctl(enc, OPUS_SET_DTX(0));
	opus_encoder_ctl(enc, OPUS_SET_PACKET_LOSS_PERC(0));

	opus_encoder_ctl(enc, OPUS_GET_LOOKAHEAD(&lookahead));
	opus_encoder_ctl(enc, OPUS_SET_LSB_DEPTH(16));
	opus_encoder_ctl(enc, 
OPUS_SET_EXPERT_FRAME_DURATION(OPUS_FRAMESIZE_20_MS));
	/* CELT is faster than SILK? */
	opus_encoder_ctl(enc, OPUS_SET_FORCE_MODE(MODE_CELT_ONLY));
D. generate 20ms PCM sample data (Cosine wave with amplitude 0x6000 and 
frequency about 1150 Hz)
E. encode the PCM data and decode it immediately, count the CPU usages.
F. repeat until reach the duration time (1000ms or 10000ms)
G. The summary of STM32F407 Test Result as below:
	Mode  Sample  Chan Freq. Duration Encode + Decode = Total
	FLOAT 48kHz   2  1150   1000ms  2735ms + 3367ms =  6102ms

	FIXED 48kHz   2  1150   1000ms  2112ms + 1543ms =  3698ms
	FIXED 48kHz   1  1150   1000ms  1312ms +  911ms =  2249ms
	FIXED 24kHz   1  1150   1000ms  1067ms +  783ms =  1872ms
	FIXED 16kHz   1  1150   1000ms   922ms +  711ms =  1651ms
	FIXED 12kHz   1  1150   1000ms  1296ms +  193ms =  1507ms
	FIXED  8kHz   2  1150   1000ms  1014ms +  147ms =  1181ms
	FIXED  8kHz   1  1150   1000ms  1086ms +  135ms =  1241ms
	FIXED  8kHz   1  1150  10000ms 11206ms + 1318ms = 12544ms
H. Build Options
	FLOAT: OPUS_BUILD,USE_ALLOCA,CUSTOM_SUPPORT
	FIXED: OPUS_BUILD,USE_ALLOCA,CUSTOM_SUPPORT,FIXED_POINT,DISABLE_FLOAT_API

Note: the target bit rate is twice of the sampling frequency. That's to say, 
the bit rate will be 96kbps, if the sampling frequency is 48kHz.

The CPU usage is about 91% (911ms/1000ms), when decode 48KHz/mono/96bps. but 
encode requires more CPU (132%, 1312/1000ms).

I will try lower bit rate and update the result later.

Sincerely
Forrest

On Sunday, January 14, 2018 9:05:44 AM CST Thomas Böhm wrote:
> Hello Forrest,
> some years ago i developed a network media player based on a
> STM32F407ZGT6 (168MHz clock) and opus 1.1.
> I used just the fixed point code and did no particular optimization on
> the opus code itself because the performance was already quite good, see
> figures below.
> The figures are for real time playback with different frame sizes and
> various constant bit rates.
> I didn't play that much with encoding, but I'm convinced that the 32F407
> is powerful enough to do the job, if you use all its capabilities.
> 
> Most important is to use the hardware features of the processor like the
> DMA controller or the CRC calculation unit, if you deal with ogg, to
> unload the CPU.
> 
> SILK narrow band, a) mono b) stereo:
> 
> SILK medium band, a) mono b) stereo:
> 
> Hybride wide band, a) mono b) stereo:
> 
> Hybride super wide band, a) mono b) stereo:
> 
> Hybride full band, a) mono b) stereo:
> 
> 
> CELT full band mono:
> 
> CELT full band stereo:
> 
> Regards,
> Thomas
> 
> Am 06.01.2018 um 10:02 schrieb forrest:
> > Dear Developers,
> > 
> > 
> > I make a opus 1.2.1 codec build for STM32F407(fixed-point and disable
> > float APIs).
> > 
> > it seems too slow for the VOIP application.
> > 
> > 
> > Case 1:
> > 
> > 48KHz Sampling rate, Stereo, VBR, frame size: 20ms, Bit-rates: 96kbps
> > 
> > Encode cost: 2.11x real time
> > 
> > Decode cost: 1.54x real time
> > 
> > Encode + Decode: 3.65x
> > 
> > 
> > Case 2:
> > 
> > 8KHz Sampling rate, Mono, VBR, frame size: 20ms, Bit-rates: 16kbps
> > 
> > Encode cost: 1.08x real time
> > 
> > Decode cost: 0.14x real time
> > 
> > Encode + Decode: 1.24x
> > 
> > 
> > Are there any available optimizations or suggestions for Cortex-M4?
> > 
> > 
> > As I knonw, TI TM4C129x is based on Cortex-M4 too:
> > 
> > http://www.ti.com/tool/TIDM-TM4C129POEAUDIO
> > 
> > 
> > The performance of opus on it is good enough for mono 48KHz sampling rate.
> > 
> > CPU usage is only about 70% of 120MHz when encode/decode at same time.
> > 
> > 
> > Sincerely
> > 
> > Forrest
> > 
> > 
> > 
> > 
> > _______________________________________________
> > opus mailing list
> > opus at xiph.org
> > http://lists.xiph.org/mailman/listinfo/opus





More information about the opus mailing list