[opus] Ask for suggestions about optimizing opus on STM32F407

Forrest Zhang forrest at 263.net
Sun Feb 4 12:06:36 UTC 2018


Hello Thomas and Amit,

The problem has been solved! I really appreciate your helps!

Previously I got the worse performance on STM32F407ZG, because run opus on the external (FSMC) RAM.
If use internal RAM, it's faster about 6 to 7 times.

Generally the SILK encode requires more CPU, if use CELT/Fixed Point encoding and decoding 48kHz stereo audio, the speed is about 1.68 times of real time. But the speed of SILK/Fixed Point is about 0.73x real time.

I also do the performance test with an opus (Ogg format) audio file (48kHz sampling/Stereo, 48kbps, 4.2 seconds). Decode it firstly, then encode it immediately.
  * Decode: 1468 ms
  * Encode:  992 ms
  * Total:  2461 ms
The speed is about 1.7x real time (4200/2461), and about 59% CPU usage.

I attached the detailed test result here for new developer reference.
  * Opus Performance on STM32F407ZG
  * ===============================
  * 1000ms PCM samples (1150Hz cosine wave, amplitude 0x6000)
  * Frame size: 20ms
  * Case A: CELT, external memory, bitrate = 2x sampling, FIXED_POINT, DISABLE_FLOAT_API
  * Case B: CELT, internal memory, bitrate = 2x sampling, FIXED_POINT, DISABLE_FLOAT_API
  * Case C: CELT, internal memory, bitrate = 1x sampling, FIXED_POINT, DISABLE_FLOAT_API
  * Case D: CELT, internal memory, bitrate = 2x sampling, FIXED_POINT, FLOAT_API
  * Case E: CELT, internal memory, bitrate = 2x sampling, FLOAT,       FLOAT_API
  * Case F: SILK, internal memory, bitrate = 2x sampling, FLOAT,       FLOAT_API
  * Case G: SILK, internal memory, bitrate = 2x sampling, FIXED_POINT, FLOAT_API
  * Result: encode time + decode time = total cost time (in milliseconds)
  *  Sampling*Chan  A: External, 2x    B: Internal, 2x  C: Internal, 1x  D: FLOAT API,2x  E: FLOAT,2x     E: SILK, FLOAT  E: SILK, FIXED
  *  ============= ==================  ===============  ===============  ===============  =============== =============== ===============
  *    48kHz * 2:  2123 + 1533 = 3698  346 + 234 = 587  305 + 216 = 528  352 + 236 = 595  534 + 392 = 937 7817+ 398 =8367 1013+ 345 =1374
  *    48kHz * 1:  1292 +  907 = 2225  213 + 144 = 361  170 + 121 = 295  214 + 145 = 363  338 + 240 = 584 3922+ 230 =4215  525+ 196 = 729
  *    24kHz * 2:  1862 + 1427 = 3325  298 + 207 = 511  239 + 176 = 402  301 + 209 = 516  443 + 306 = 758 7381+ 288 =7743  942+ 301 =1257
  *    24kHz * 1:  1058 +  708 = 1860  169 + 119 = 291  141 + 104 = 248  172 + 117 = 293  240 + 175 = 420 3843+ 160 =4063  479+ 156 = 642
  *    16kHz * 2:  1701 + 1372 = 3105  270 + 194 = 469  210 + 164 = 378  269 + 199 = 473  396 + 267 = 670 7384+ 119 =7646  683+  93 = 785
  *    16kHz * 1:   907 +  708 = 1633  142 + 104 = 249  116 +  99 = 217  144 + 103 = 250  180 + 139 = 323 3651+  57 =3766  335+  42 = 381
  *    12kHz * 2:  1509 + 1240 = 2778  225 + 169 = 399  197 + 158 = 359  227 + 171 = 402  235 + 180 = 419 2919+  53 =3008  299+  31 = 333
  *    12kHz * 1:   857 +  681 = 1555  136 +  97 = 236  117 +  84 = 203  137 +  96 = 236  159 + 128 = 290 2818+  44 =2899  290+  30 = 323
  *     8kHz * 2:  1371 + 1168 = 2567  198 + 157 = 359  191 + 156 = 351  200 + 158 = 362  181 + 148 = 333 2173+  35 =2237  246+  28 = 276
  *     8kHz * 1:   761 +  628 = 1404  111 +  92 = 205  106 +  89 = 197  120 +  84 = 206  123 + 100 = 226 2123+  31 =2182  255+  24 = 281

Sincerely
Forrest

On 2018/1/15 12:31, Forrest Zhang wrote:
> Hello Thomas and Amit,
>
> Thanks for your notice and the detailed decode performance report.
>
> I describe the details of my encode/decode test on STM32F407ZG.
>
> A. opus version: latest 1.2.1 (TI: opus 1.1.2)
> B. KEIL 5.23 (TI: ARM compiler tool chain 5.2.7)
> C. setup the encoder as the below (fs is the sampling frequency)
> 	enc = opus_encoder_create(fs, chans, OPUS_APPLICATION_AUDIO, &opus_err);
> 	opus_encoder_ctl(enc, OPUS_SET_BITRATE(fs * 2));
> 	opus_encoder_ctl(enc, OPUS_SET_BANDWIDTH(OPUS_AUTO));
> 	opus_encoder_ctl(enc, OPUS_SET_VBR(1));
> 	opus_encoder_ctl(enc, OPUS_SET_VBR_CONSTRAINT(0));
> 	opus_encoder_ctl(enc, OPUS_SET_COMPLEXITY(0));
> 	opus_encoder_ctl(enc, OPUS_SET_INBAND_FEC(0));
> 	opus_encoder_ctl(enc, OPUS_SET_FORCE_CHANNELS(OPUS_AUTO));
> 	opus_encoder_ctl(enc, OPUS_SET_DTX(0));
> 	opus_encoder_ctl(enc, OPUS_SET_PACKET_LOSS_PERC(0));
>
> 	opus_encoder_ctl(enc, OPUS_GET_LOOKAHEAD(&lookahead));
> 	opus_encoder_ctl(enc, OPUS_SET_LSB_DEPTH(16));
> 	opus_encoder_ctl(enc,
> OPUS_SET_EXPERT_FRAME_DURATION(OPUS_FRAMESIZE_20_MS));
> 	/* CELT is faster than SILK? */
> 	opus_encoder_ctl(enc, OPUS_SET_FORCE_MODE(MODE_CELT_ONLY));
> D. generate 20ms PCM sample data (Cosine wave with amplitude 0x6000 and
> frequency about 1150 Hz)
> E. encode the PCM data and decode it immediately, count the CPU usages.
> F. repeat until reach the duration time (1000ms or 10000ms)
> G. The summary of STM32F407 Test Result as below:
> 	Mode  Sample  Chan Freq. Duration Encode + Decode = Total
> 	FLOAT 48kHz   2  1150   1000ms  2735ms + 3367ms =  6102ms
>
> 	FIXED 48kHz   2  1150   1000ms  2112ms + 1543ms =  3698ms
> 	FIXED 48kHz   1  1150   1000ms  1312ms +  911ms =  2249ms
> 	FIXED 24kHz   1  1150   1000ms  1067ms +  783ms =  1872ms
> 	FIXED 16kHz   1  1150   1000ms   922ms +  711ms =  1651ms
> 	FIXED 12kHz   1  1150   1000ms  1296ms +  193ms =  1507ms
> 	FIXED  8kHz   2  1150   1000ms  1014ms +  147ms =  1181ms
> 	FIXED  8kHz   1  1150   1000ms  1086ms +  135ms =  1241ms
> 	FIXED  8kHz   1  1150  10000ms 11206ms + 1318ms = 12544ms
> H. Build Options
> 	FLOAT: OPUS_BUILD,USE_ALLOCA,CUSTOM_SUPPORT
> 	FIXED: OPUS_BUILD,USE_ALLOCA,CUSTOM_SUPPORT,FIXED_POINT,DISABLE_FLOAT_API
>
> Note: the target bit rate is twice of the sampling frequency. That's to say,
> the bit rate will be 96kbps, if the sampling frequency is 48kHz.
>
> The CPU usage is about 91% (911ms/1000ms), when decode 48KHz/mono/96bps. but
> encode requires more CPU (132%, 1312/1000ms).
>
> I will try lower bit rate and update the result later.
>
> Sincerely
> Forrest
>
> On Sunday, January 14, 2018 9:05:44 AM CST Thomas Böhm wrote:
>> Hello Forrest,
>> some years ago i developed a network media player based on a
>> STM32F407ZGT6 (168MHz clock) and opus 1.1.
>> I used just the fixed point code and did no particular optimization on
>> the opus code itself because the performance was already quite good, see
>> figures below.
>> The figures are for real time playback with different frame sizes and
>> various constant bit rates.
>> I didn't play that much with encoding, but I'm convinced that the 32F407
>> is powerful enough to do the job, if you use all its capabilities.
>>
>> Most important is to use the hardware features of the processor like the
>> DMA controller or the CRC calculation unit, if you deal with ogg, to
>> unload the CPU.
>>
>> SILK narrow band, a) mono b) stereo:
>>
>> SILK medium band, a) mono b) stereo:
>>
>> Hybride wide band, a) mono b) stereo:
>>
>> Hybride super wide band, a) mono b) stereo:
>>
>> Hybride full band, a) mono b) stereo:
>>
>>
>> CELT full band mono:
>>
>> CELT full band stereo:
>>
>> Regards,
>> Thomas
>>
>> Am 06.01.2018 um 10:02 schrieb forrest:
>>> Dear Developers,
>>>
>>>
>>> I make a opus 1.2.1 codec build for STM32F407(fixed-point and disable
>>> float APIs).
>>>
>>> it seems too slow for the VOIP application.
>>>
>>>
>>> Case 1:
>>>
>>> 48KHz Sampling rate, Stereo, VBR, frame size: 20ms, Bit-rates: 96kbps
>>>
>>> Encode cost: 2.11x real time
>>>
>>> Decode cost: 1.54x real time
>>>
>>> Encode + Decode: 3.65x
>>>
>>>
>>> Case 2:
>>>
>>> 8KHz Sampling rate, Mono, VBR, frame size: 20ms, Bit-rates: 16kbps
>>>
>>> Encode cost: 1.08x real time
>>>
>>> Decode cost: 0.14x real time
>>>
>>> Encode + Decode: 1.24x
>>>
>>>
>>> Are there any available optimizations or suggestions for Cortex-M4?
>>>
>>>
>>> As I knonw, TI TM4C129x is based on Cortex-M4 too:
>>>
>>> http://www.ti.com/tool/TIDM-TM4C129POEAUDIO
>>>
>>>
>>> The performance of opus on it is good enough for mono 48KHz sampling rate.
>>>
>>> CPU usage is only about 70% of 120MHz when encode/decode at same time.
>>>
>>>
>>> Sincerely
>>>
>>> Forrest
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> opus mailing list
>>> opus at xiph.org
>>> http://lists.xiph.org/mailman/listinfo/opus
>
>
>




More information about the opus mailing list