[flac-dev] flac decoder output buffer alignment

Stefan Oltmanns stefan-oltmanns at gmx.net
Mon Mar 17 01:58:33 UTC 2025


Hi,

yes, SSE requires aligned buffers for operations that directly read an
operand from memory, AVX is more relaxed, but is supposed to be faster
with aligned data.

I'm not sure if macOS automatically aligns buffers with malloc, but
since C11 there is aligned_alloc, that works fine on Linux/macOS.
Unfortunately it is not implemented on Windows, there is _aligned_malloc
on Windows, that is almost the same, except that it doesn't work with
free, but requires _aligned_free.
Therefore using aligned buffers on Windows is a bit more complex, as you
also have to replace free.
But flac already does some alignment tricks for the output buffer to
reserve same space in front. That could be optimized to ensure output
buffer address is a multiple of N (N being defined at compile time).

My application should work on Linux, macOS and Windows.

Best regards
Stefan


Am 16.03.25 um 19:54 schrieb brianw:
> I believe that the SSE/AVX hardware engine only works with aligned buffers.
>
> That said, I also believe that macOS already aligns buffers, even with simple malloc(), although I might be wrong. At the very least, there is surely a CoreAudio memory allocation function that aligns buffers for audio, so that Apple can use SSE/AVX hardware acceleration on audio buffers.
>
> Stefan, have you tried your work on macOS?
>
> Brian Willoughby
>
>
> On Mar 16, 2025, at 11:48 AM, Martijn van Beurden wrote:
>> Hi,
>>
>> Please explain why you need aligned buffers.
>>
>> Kind regards, Martijn van Beurden
>>
>> Op zo 16 mrt 2025 om 01:36 schreef Stefan Oltmanns:
>>>
>>> Hello,
>>>
>>> I want to process the output from libflac with SSE/AVX. Unfortunately it
>>> seems that libflac always allocates the output buffer itself and there
>>> is no way to provide a buffer by the application.
>>>  From my understanding of the code flac is using it's own functions in
>>> share/alloc.h for allocations, and those use plain malloc. I assume the
>>> only way to force aligned output buffers is to modify the alloc.h and
>>> then link libflac statically and not use a system provided version?
>>>
>>> I'm also open for any other flac decoding library (written in C or with
>>> C header) that has the following features:
>>>
>>> -Support seeking (with fast seeking using seektables if available)
>>> -Support Ogg FLAC
>>> -Fast decoding
>>> -Should be able to handle *very long* flac files (like 2^40 samples).
>>> libavcodec/ffmpeg fails at this, as some internal counter overflows
>>>
>>> Best regards
>>> Stefan
>



More information about the flac-dev mailing list