[flac-dev] flac decoder output buffer alignment
Stefan Oltmanns
stefan-oltmanns at gmx.net
Thu Mar 20 10:19:12 UTC 2025
Hi,
I made some test runs and you are right: The difference is negligible,
because at least in my test with the -O3 flag the compiler produces code
that starts with the unaligned beginning, then processes the large
middle part with aligned reads and SIMD instructions and then the rest.
Even in a synthetic benchmark only over the a single function to convert
the data the difference is less than 2%.
Maybe on very old processors this is different, I tested on AMD Zen 3
and Intel Ivy Bridge. But of course SSE code will crash with direct
unaligned reads (instructions that can take a memory address as argument).
Best regards
Stefan
Am 17.03.25 um 08:37 schrieb Martijn van Beurden:
> Op ma 17 mrt 2025 om 02:58 schreef Stefan Oltmanns <stefan-oltmanns at gmx.net>:
>>
>> Hi,
>>
>> yes, SSE requires aligned buffers for operations that directly read an
>> operand from memory
>
> libFLAC does unaligned SIMD all the time, both SSE and AVX, so I don't
> think that is true. See
>
> https://c9x.me/x86/html/file_module_x86_id_184.html
>
> I'm not sure what you want from me here. On fairly modern CPUs,
> unaligned memory access isn't really slower in most use cases. I
> highly doubt this would really be a performance problem in your code
> in any way. Maybe you can present some numbers?
>
> Kind regards, Martijn van Beurden
More information about the flac-dev
mailing list