[flac-dev] Performance and precompute_partition_info_sums_32bit_asm_ia32_()
lvqcl.mail at gmail.com
Tue Sep 17 11:21:42 PDT 2013
Previously I wrote that precompute_partition_info_sums_32bit_asm_ia32_() only
makes encoding slower. Now I managed to compile flac with GCC 4.8.1, with this
function enabled and disabled. NASM was enabled, SSE intrinsics disabled.
Then I added -msse option (so that all C code was compiled with -msse),
then -msse2 and so on.
Input file for test: 44.1kHz/16bit/stereo; best compression mode (flac -8); CPU = Core i7.
Here are the results (1st column: SSE instruction set, 2nd column: the state of
precompute_partition_info_sums_32bit_asm_ia32_(), 3rd column: encoding time
in seconds, smaller=better):
no SSE disabled 53.9
no SSE enabled 55.2
SSE1 disabled 53.9
SSE1 enabled 55.3
SSE2 disabled 51.9
SSE2 enabled 53.1
SSE3 disabled 51.8
SSE3 enabled 53.2
SSSE3 disabled 45.7
SSSE3 enabled 51.4
SSE41 disabled 46.1
SSE41 enabled 51.6
SSE42 disabled 46.1
SSE42 enabled 51.6
1) flac is always faster when precompute_partition_info_sums_32bit_asm_ia32_() is disabled.
2) Some C code benefits noticeably from SSSE3 instructions; at least when compiled with GCC 4.8.1.
More information about the flac-dev