[flac-dev] About SSE intrinsincs in decoder

olivier tristan o.tristan at uvi.net
Thu Jul 7 07:46:58 UTC 2022


Le 07/07/2022 à 09:34, Martijn van Beurden a écrit :

> Op do 7 jul. 2022 om 09:07 schreef olivier tristan <o.tristan at uvi.net>:
> > Hence even small optimization are very welcomed :)
>
> I presume you use libFLAC directly then. Sadly there is little left to 
> optimize in the decoder. Below is an excerpt of the output of gprof on 
> flac decoding a track
>
> >  %   cumulative   self              self     total
> > time   seconds   seconds    calls   s/call   s/call  name
> > 34.87      0.68     0.68   680925     0.00     0.00 
>  FLAC__bitreader_read_rice_signed_block
> > 25.64      1.18     0.50  6004826     0.00     0.00  FLAC__MD5Transform
> > 14.36      1.46     0.28    46030     0.00     0.00 
>  FLAC__lpc_restore_signal
> >  8.72      1.63     0.17    23457     0.00     0.00  read_frame_
> >  5.13      1.73     0.10    23457     0.00     0.00  write_callback
> >  3.08      1.79     0.06    23457     0.00     0.00  FLAC__MD5Accumulate
> >  3.08      1.85     0.06                             read
> >  2.56      1.90     0.05    50901     0.00     0.00 
>  FLAC__crc16_update_words32
> >  1.03      1.92     0.02    23457     0.00     0.00 
>  write_audio_frame_to_client_
> >  0.51      1.93     0.01  2016520     0.00     0.00 
>  bitreader_read_from_client_
> >  0.51      1.94     0.01 _IO_file_seekoff
> >  0.51      1.95     0.01 write
>
> As you can see, the bitreader takes up most time. This is however not 
> something that can be optimized with SIMD/vector instructions like 
> SSE, AVX, NEON etc. It is also strictly a sequential process. In the 
> past there have been several attempts at improving speed of this call. 
> You could try for yourself configuring using ./configure 
> --enable-64-bit-words or cmake -DENABLE_64_BIT_WORDS=ON whether that 
> brings any (small) improvement.
>
> Next the MD5 transformation takes up a lot of time too, but I suppose 
> you do not use that anyway. It is disabled by default when decoding 
> using libFLAC directly.
>
> Finally the lpc restore takes up some time and can be improved with 
> SSE, AVX, NEON etc., but it represents only a small part of the 
> decoding CPU load.
>
>
We use libflac directly indeed so MD5 is not enabled in my case.

We indeed see in the perf analyzer 
FLAC__bitreader_read_rice_signed_block and FLAC__lpc_restore_signal

> Perhaps it is possible to add a switch to the encoder to create FLAC 
> files that are optimized for decoding speed instead of size. Would 
> that be something you would use? For example trading in 5% less 
> compression against 30% more decoding speed, assuming that MD5 
> checking is already off?
This would indeed be interesting.

The material we use are very well compressed by FLAC as this is just a 
single note of an instrument as opposed to a song.

For example in a piano library, we can divide the sample size by 4.


-- 
Olivier Tristan
Research & Development
www.uvi.net
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xiph.org/pipermail/flac-dev/attachments/20220707/3e0abeae/attachment-0001.htm>


More information about the flac-dev mailing list