[flac-dev] About SSE intrinsincs in decoder
olivier tristan
o.tristan at uvi.net
Thu Jul 7 07:46:58 UTC 2022
Le 07/07/2022 à 09:34, Martijn van Beurden a écrit :
> Op do 7 jul. 2022 om 09:07 schreef olivier tristan <o.tristan at uvi.net>:
> > Hence even small optimization are very welcomed :)
>
> I presume you use libFLAC directly then. Sadly there is little left to
> optimize in the decoder. Below is an excerpt of the output of gprof on
> flac decoding a track
>
> > % cumulative self self total
> > time seconds seconds calls s/call s/call name
> > 34.87 0.68 0.68 680925 0.00 0.00
> FLAC__bitreader_read_rice_signed_block
> > 25.64 1.18 0.50 6004826 0.00 0.00 FLAC__MD5Transform
> > 14.36 1.46 0.28 46030 0.00 0.00
> FLAC__lpc_restore_signal
> > 8.72 1.63 0.17 23457 0.00 0.00 read_frame_
> > 5.13 1.73 0.10 23457 0.00 0.00 write_callback
> > 3.08 1.79 0.06 23457 0.00 0.00 FLAC__MD5Accumulate
> > 3.08 1.85 0.06 read
> > 2.56 1.90 0.05 50901 0.00 0.00
> FLAC__crc16_update_words32
> > 1.03 1.92 0.02 23457 0.00 0.00
> write_audio_frame_to_client_
> > 0.51 1.93 0.01 2016520 0.00 0.00
> bitreader_read_from_client_
> > 0.51 1.94 0.01 _IO_file_seekoff
> > 0.51 1.95 0.01 write
>
> As you can see, the bitreader takes up most time. This is however not
> something that can be optimized with SIMD/vector instructions like
> SSE, AVX, NEON etc. It is also strictly a sequential process. In the
> past there have been several attempts at improving speed of this call.
> You could try for yourself configuring using ./configure
> --enable-64-bit-words or cmake -DENABLE_64_BIT_WORDS=ON whether that
> brings any (small) improvement.
>
> Next the MD5 transformation takes up a lot of time too, but I suppose
> you do not use that anyway. It is disabled by default when decoding
> using libFLAC directly.
>
> Finally the lpc restore takes up some time and can be improved with
> SSE, AVX, NEON etc., but it represents only a small part of the
> decoding CPU load.
>
>
We use libflac directly indeed so MD5 is not enabled in my case.
We indeed see in the perf analyzer
FLAC__bitreader_read_rice_signed_block and FLAC__lpc_restore_signal
> Perhaps it is possible to add a switch to the encoder to create FLAC
> files that are optimized for decoding speed instead of size. Would
> that be something you would use? For example trading in 5% less
> compression against 30% more decoding speed, assuming that MD5
> checking is already off?
This would indeed be interesting.
The material we use are very well compressed by FLAC as this is just a
single note of an instrument as opposed to a song.
For example in a piano library, we can divide the sample size by 4.
--
Olivier Tristan
Research & Development
www.uvi.net
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xiph.org/pipermail/flac-dev/attachments/20220707/3e0abeae/attachment-0001.htm>
More information about the flac-dev
mailing list