[flac-dev] AVX2 / 3DNow.

lvqcl lvqcl.mail at gmail.com
Tue Sep 30 11:57:16 PDT 2014

It is relatively easy to convert some SSE2/3/4 code into AVX2: just
use AVX2 intrinsics instead of SSE and the logic of the functions.
Unfortunately my CPU doesn't have AVX2. But today I managed to briefly
test AVX2 code on i5 Haswell CPU. Unfortunately I wasn't able to run
full test suite on Haswell, but it seems that the new code works correctly.
The results of a quick performance test are:

16-bit WAV encoding: ~20% speed increase
24-bit WAV encoding: ~40% speed increase

The speed increase isn't impressive for 16-bit input...
and this code requires Haswell. But it's still some
speed improvement that will cost another increase of
the size of executable files (by 20-30 kB).

What do you think?

Also the new code requires AVX CPU/OS support detection code to be added
to cpu.c I'd like to simplify it slightly further before this. For example,
by removing 3DNow code because it's hardly relevant these days.

More information about the flac-dev mailing list