[opus] [EXT] Re: [PATCH] Make CELT FFT twiddle complex type aligned

Zheng Lv lvzheng at google.com
Mon Oct 31 03:10:53 UTC 2022


Hi Ulrich,

Making the typedef aligned can benefit more. As long as an instance sits in
memory, be it static, stack, or heap, additional alignment can benefit.
This patch came from the discovery that the performance randomly changes by
just turning on and off LTO or modifying totally unrelated code linked to
the final executable. If we just make that specific array aligned, it would
be determined totally by the linker's "mood" whether the other occurrences
are 4-bytes aligned or not.

I also suspect (though I have not verified) that for some architectures
that are unable to access unaligned memory, making the typedef aligned
would facilitate the compiler to generate faster code.

Best regards,
Zheng



On Mon, 24 Oct 2022 at 15:27, Ulrich Windl <
Ulrich.Windl at rz.uni-regensburg.de> wrote:

> >>> Zheng Lv <lvzheng at google.com> schrieb am 24.10.2022 um 04:31 in
> Nachricht
> <CACNzCb422LVKKUzRSYJnDH09uaVgOABB1w6z48zYUXMPQvQAmg at mail.gmail.com>:
> > Hi Team,
> >
> > Can anyone take a look? We are already using this optimization in
> > production.
> >
> > Best regards,
> > Zheng
> >
> > On Thu, 15 Sept 2022 at 19:04, Zheng Lv <lvzheng at google.com> wrote:
> >
> >> This makes kiss_twiddle_cpx 4-byte aligned (instead of 2-byte) for
> >> fixed-point builds. Tested with an armv6j+nofp development board, CELT
> >> encoding becomes 1.4x as fast, and decoding over 2x.
> >>
> >> Performance gain is mostly attributed to the proper alignment of the
> >> static const array mdct_twiddles960.
> >>
> >> Co-authored-by: David Gao <davidgao at google.com>
> >> ---
> >>  celt/kiss_fft.h | 12 +++++++++++-
> >>  1 file changed, 11 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/celt/kiss_fft.h b/celt/kiss_fft.h
> >> index bffa2bfa..267f72f9 100644
> >> --- a/celt/kiss_fft.h
> >> +++ b/celt/kiss_fft.h
> >> @@ -49,31 +49,41 @@ extern "C" {
> >>  #ifdef FIXED_POINT
> >>  #include "arch.h"
> >>
> >>  #  define kiss_fft_scalar opus_int32
> >>  #  define kiss_twiddle_scalar opus_int16
> >>
> >> +/* Some 32-bit CPUs would load/store a kiss_twiddle_cpx with a single
> >> memory
> >> + * access, and could benefit from additional alignment.
> >> + */
> >> +#  define KISS_TWIDDLE_CPX_ALIGNMENT (sizeof(opus_int32))
> >>
> >>  #else
> >>  # ifndef kiss_fft_scalar
> >>  /*  default is float */
> >>  #   define kiss_fft_scalar float
> >>  #   define kiss_twiddle_scalar float
> >>  #   define KF_SUFFIX _celt_single
> >>  # endif
> >>  #endif
> >>
> >> +#if defined(__GNUC__) && defined(KISS_TWIDDLE_CPX_ALIGNMENT)
> >> +#define KISS_TWIDDLE_CPX_ALIGNED
> >> __attribute__((aligned(KISS_TWIDDLE_CPX_ALIGNMENT)))
> >> +#else
> >> +#define KISS_TWIDDLE_CPX_ALIGNED
> >> +#endif
> >> +
> >>  typedef struct {
> >>      kiss_fft_scalar r;
> >>      kiss_fft_scalar i;
> >>  }kiss_fft_cpx;
> >>
> >>  typedef struct {
> >>     kiss_twiddle_scalar r;
> >>     kiss_twiddle_scalar i;
> >> -}kiss_twiddle_cpx;
> >> +} KISS_TWIDDLE_CPX_ALIGNED kiss_twiddle_cpx;
>
> I'm only surprised that you do align the typedef and not the actual data!
>
>
> >>
> >>  #define MAXFACTORS 8
> >>  /* e.g. an fft of length 128 has 4 factors
> >>   as far as kissfft is concerned
> >>   4*4*4*2
> >>   */
> >> --
> >> 2.37.2.789.g6183377224-goog
> >>
> >>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xiph.org/pipermail/opus/attachments/20221031/1a5a5718/attachment.htm>


More information about the opus mailing list