[flac-dev] About SSE intrinsincs in decoder

Martijn van Beurden mvanb1 at gmail.com
Thu Jul 7 07:34:04 UTC 2022


Op do 7 jul. 2022 om 09:07 schreef olivier tristan <o.tristan at uvi.net>:
> Hence even small optimization are very welcomed :)

I presume you use libFLAC directly then. Sadly there is little left to
optimize in the decoder. Below is an excerpt of the output of gprof on flac
decoding a track

>  %   cumulative   self              self     total
> time   seconds   seconds    calls   s/call   s/call  name
> 34.87      0.68     0.68   680925     0.00     0.00
 FLAC__bitreader_read_rice_signed_block
> 25.64      1.18     0.50  6004826     0.00     0.00  FLAC__MD5Transform
> 14.36      1.46     0.28    46030     0.00     0.00
 FLAC__lpc_restore_signal
>  8.72      1.63     0.17    23457     0.00     0.00  read_frame_
>  5.13      1.73     0.10    23457     0.00     0.00  write_callback
>  3.08      1.79     0.06    23457     0.00     0.00  FLAC__MD5Accumulate
>  3.08      1.85     0.06                             read
>  2.56      1.90     0.05    50901     0.00     0.00
 FLAC__crc16_update_words32
>  1.03      1.92     0.02    23457     0.00     0.00
 write_audio_frame_to_client_
>  0.51      1.93     0.01  2016520     0.00     0.00
 bitreader_read_from_client_
>  0.51      1.94     0.01                             _IO_file_seekoff
>  0.51      1.95     0.01                             write

As you can see, the bitreader takes up most time. This is however not
something that can be optimized with SIMD/vector instructions like SSE,
AVX, NEON etc. It is also strictly a sequential process. In the past there
have been several attempts at improving speed of this call. You could try
for yourself configuring using ./configure --enable-64-bit-words or cmake
-DENABLE_64_BIT_WORDS=ON whether that brings any (small) improvement.

Next the MD5 transformation takes up a lot of time too, but I suppose you
do not use that anyway. It is disabled by default when decoding using
libFLAC directly.

Finally the lpc restore takes up some time and can be improved with SSE,
AVX, NEON etc., but it represents only a small part of the decoding CPU
load.

Perhaps it is possible to add a switch to the encoder to create FLAC files
that are optimized for decoding speed instead of size. Would that be
something you would use? For example trading in 5% less compression against
30% more decoding speed, assuming that MD5 checking is already off?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xiph.org/pipermail/flac-dev/attachments/20220707/5266b982/attachment.htm>


More information about the flac-dev mailing list