[Flac-dev] Proof-of-concept multithreaded FLAC encoder

Frederick Akalin akalin at gmail.com
Tue May 6 11:31:41 PDT 2008


Hey FLAC devs,

I managed to hack out a proof-of-concept multithreaded FLAC encoder
based on the example libFLAC one.  It turned out to be fairly
straightforward to get near-linear speedup; I can encode a 636 MB wave
file in 6.8s with 8 threads on an 8-core 3.0 GHz Xeon vs. 31.4s with a
single thread.

Basically I mmap() the input file, divide up the mmap()ed region into
nearly equal pieces, parcel them out to encode threads, and write out
the output chunks (asynchronously) in order as soon as they're ready.
I had to hack up libFLAC a bit:

- I exposed update_metadata_() and FLAC__stream_encoder_set_do_md5().
- I added the function FLAC__stream_encoder_set_current_frame_number().

I turned off md5 calculation in the example libFLAC encoder for ease
of comparison and also verification as it interacts badly (crashes)
with FLAC__stream_encoder_set_current_frame_number().  I also zero out
the min/max frame size fields in the metadata, which should be the only
byte difference between the files the multithreaded encoder outputs and
the ones the example libFLAC one outputs (well, and md5).

Patch file for flac 1.2.1 or CVS: http://www.akalin.cx/patch-libFLAC.in
Source file for multithreaded FLAC encoder: http://www.akalin.cx/mt_encode.c

mt_encode.c should compile on gcc 4.x with -Wall -Werror -g -O2 -ansi .
Usage is simply "mt_encode.c input.wav output.flac [num_threads]".

Of course, I'm not suggesting that the patch file above be committed;
it was just to get the proof-of-concept multithreaded encoder working
with a minimum of fuss.  However, looking at the earlier discussion on
parallelizing flac (
http://lists.xiph.org/pipermail/flac-dev/2007-September/002312.html )
I think there are a number of misconceptions floating around.  I don't
think the existing encoding APIs (stream_encoder.h) should be
retrofitted to support multithreading, but maybe a separate file- or
block-based API could be written that shares most of the code with the
existing stream APIs.  I agree that the stream-based API should always
be primary, but there's no reason that it couldn't co-exist with an
alternate parallel-friendly API.  As an end-result I envision a
utility similar to the flac command-line utility (flac-mt?) that
supports multithreading, but with only a subset of the functionality
of flac, according to feasability and demand.

As a first step, a function that simply encodes a block of memory to
an output buffer with a given frame number etc. would be helpful, as
well a simple function to build and write out a streaminfo metadata
block to memory.  Maybe also exposing FLAC's implementation of md5
calculation in the form of a function that md5s a memory block, too.
I might have time to hammer something out for review.  What do you
guys think?

--
Frederick Akalin
http://www.akalin.cx


More information about the Flac-dev mailing list