[opus] [Profiling][FFT][AArch64] FFT Profiling data on AArch64

Viswanath Puttagunta viswanath.puttagunta at linaro.org
Tue Nov 25 06:49:14 PST 2014


Hello Phil,

Data you presented is about in-line with what I observed on ARMv7
(Cortex-A8 Beaglebone Black) as well.

opus_fft_impl() is one of the top contributors for performance in
below opus decode (celt decode) case.

To observe how much kf_bfly4_c (called by opus_fft_impl) contributes,
I removed the "static" keyword and did the run below. So, overvall,
opus_fft_impl contributes to about 11.9 + 5.59 = 17.49% during decode
use case.

Is this the kind of data you are looking for? More information is
presented in [1] where I optimized the kf_bfly4_c(), posted patch at
[2]. But after you mentioned about your FFT work in NE10, I requested
that [2] be put on hold. Do let me know should you need any further
information.

$ perf_3.17.0-1 record opusdec music_48kbps.opus k.wav
$ perf_3.17.0-1 report

Samples: 99K of event 'cycles', Event count (approx.): 798645278

Overhead  Command  Shared Object      Symbol
 24.71%  opusdec  opusdec            [.] audio_write
 11.90%  opusdec  libopus.so         [.] opus_fft_impl           <----
  7.94%  opusdec  libopus.so         [.] clt_mdct_backward
  7.77%  opusdec  libm-2.19.so       [.] lrintf
  6.66%  opusdec  libopus.so         [.] comb_filter
  5.59%  opusdec  libopus.so         [.] kf_bfly4_c                  <----
  5.03%  opusdec  libc-2.19.so       [.] memmove
  3.95%  opusdec  libopus.so         [.] quant_all_bands
  3.27%  opusdec  libopus.so         [.] deemphasis.isra.1
  2.85%  opusdec  libopus.so         [.] exp_rotation1
  1.52%  opusdec  libopus.so         [.] decode_pulses
  1.27%  opusdec  libopus.so         [.] __udivsi3
  1.21%  opusdec  libopus.so         [.] haar1
  1.20%  opusdec  libopus.so         [.] alg_unquant
  1.15%  opusdec  libm-2.19.so       [.] __exp_finite
  1.10%  opusdec  libopus.so         [.] quant_partition
  1.09%  opusdec  libopus.so         [.] denormalise_bands
  1.06%  opusdec  libopus.so         [.] quant_band
  1.04%  opusdec  opusdec            [.] main
  0.65%  opusdec  libopus.so         [.] compute_theta
  0.50%  opusdec  libopus.so         [.] compute_allocation

[1]: https://docs.google.com/document/d/1L6csATjSsXtzg_sa1iHZta8hOsoVWA4UjHXEakpTrNk/edit?usp=sharing
[2]: http://lists.xiph.org/pipermail/opus/2014-November/002744.html

Regards,
Vish

On 25 November 2014 at 04:17, Phil Wang <wzf0428 at gmail.com> wrote:
> Hi everyone,
>
> I have profiled Opus on AArch64. I just run opus_demo with some pcm files.
> Following is time proportion of FFT with different bitrate.
>
> Bitrate  | Time cost by FFT/iFFT
> 24kb/s  | 15%
> 48kb/s  | 15%
> 96kb/s  | 13%
>
> Any comment? I want some data close to real application, any suggestion?
>
> Thanks,
> Phil Wang


More information about the opus mailing list