[opus] [Profiling][FFT][AArch64] FFT Profiling data on AArch64
Viswanath Puttagunta
viswanath.puttagunta at linaro.org
Tue Nov 25 06:49:14 PST 2014
Hello Phil,
Data you presented is about in-line with what I observed on ARMv7
(Cortex-A8 Beaglebone Black) as well.
opus_fft_impl() is one of the top contributors for performance in
below opus decode (celt decode) case.
To observe how much kf_bfly4_c (called by opus_fft_impl) contributes,
I removed the "static" keyword and did the run below. So, overvall,
opus_fft_impl contributes to about 11.9 + 5.59 = 17.49% during decode
use case.
Is this the kind of data you are looking for? More information is
presented in [1] where I optimized the kf_bfly4_c(), posted patch at
[2]. But after you mentioned about your FFT work in NE10, I requested
that [2] be put on hold. Do let me know should you need any further
information.
$ perf_3.17.0-1 record opusdec music_48kbps.opus k.wav
$ perf_3.17.0-1 report
Samples: 99K of event 'cycles', Event count (approx.): 798645278
Overhead Command Shared Object Symbol
24.71% opusdec opusdec [.] audio_write
11.90% opusdec libopus.so [.] opus_fft_impl <----
7.94% opusdec libopus.so [.] clt_mdct_backward
7.77% opusdec libm-2.19.so [.] lrintf
6.66% opusdec libopus.so [.] comb_filter
5.59% opusdec libopus.so [.] kf_bfly4_c <----
5.03% opusdec libc-2.19.so [.] memmove
3.95% opusdec libopus.so [.] quant_all_bands
3.27% opusdec libopus.so [.] deemphasis.isra.1
2.85% opusdec libopus.so [.] exp_rotation1
1.52% opusdec libopus.so [.] decode_pulses
1.27% opusdec libopus.so [.] __udivsi3
1.21% opusdec libopus.so [.] haar1
1.20% opusdec libopus.so [.] alg_unquant
1.15% opusdec libm-2.19.so [.] __exp_finite
1.10% opusdec libopus.so [.] quant_partition
1.09% opusdec libopus.so [.] denormalise_bands
1.06% opusdec libopus.so [.] quant_band
1.04% opusdec opusdec [.] main
0.65% opusdec libopus.so [.] compute_theta
0.50% opusdec libopus.so [.] compute_allocation
[1]: https://docs.google.com/document/d/1L6csATjSsXtzg_sa1iHZta8hOsoVWA4UjHXEakpTrNk/edit?usp=sharing
[2]: http://lists.xiph.org/pipermail/opus/2014-November/002744.html
Regards,
Vish
On 25 November 2014 at 04:17, Phil Wang <wzf0428 at gmail.com> wrote:
> Hi everyone,
>
> I have profiled Opus on AArch64. I just run opus_demo with some pcm files.
> Following is time proportion of FFT with different bitrate.
>
> Bitrate | Time cost by FFT/iFFT
> 24kb/s | 15%
> 48kb/s | 15%
> 96kb/s | 13%
>
> Any comment? I want some data close to real application, any suggestion?
>
> Thanks,
> Phil Wang
More information about the opus
mailing list