[flac-dev] FLAC specification clarification

Thu Jun 25 14:03:47 UTC 2020

Op do 25 jun. 2020 om 14:09 schreef Stephen F. Booth <me at sbooth.org>:

> To me the real question is not whether that portion of the spec has been
> implemented by any existing encoders/decoders but whether the spec is
> broken (i.e. cannot be implemented as written).
>

We will never know for sure whether any existing encoder/decoder works this
way, but I can tell that two very influential ones, namely the reference
encoder and decoder, libFLAC and the ffmpeg encoder (previously known as
Flake) and decoder, do not implement negative shifts. As the licenses for
both are very open (libFLAC being BSD) I can imagine most proprietary
implementations are just straight copies.

I think the problem is not that there might be decoders that accept this or
encoders that (rarely) output this. I cannot say this for certain, but with
libFLAC and ffmpeg decoders not accepting this, I would say that the vast
majority of existing FLAC decoders does not accept this, and therefore
encoders should never output files with negative shifts, as most decoders
won't play such files.

> It's possible (generally/conceptually, not necessarily here) a negative
> shift value could be used to represent a left shift.
>

Yes, I think that was what was originally intended.

> However, I know very little about linear prediction and how coefficients
> are chosen and whether that makes sense
>

I will explain why using negative shifts has probably never any benefit.
Decoding LPC is rather simple to understand: to predict a sample, take the
first coefficient and multiply it by the previous (already decoded) sample,
add to that the second coefficient multiplied with the sample before that,
the third coefficient with the sample before etc. To predict sample 25 of a
block, the decoder has to sum this: LPC_1 * sample_24 + LPC_2 * sample_23 +
LPC_3 * sample_22 + LPC_4 * sample_21 etc. To finish the decoding of the
sample, the residual has to be added to the prediction. This residual is
stored and encoded separately.

These LPC coefficients are floating point numbers. Very often, when you sum
the coefficients (without multiplying them with samples) the results are
close to one, which means that the samples form a nicely correlated signal.
However, the FLAC format doesn't store floating point numbers, so it
quantizes them into integers to make sure no rounding errors can make the
result not-lossless.

How does this work? Assume we have a signal that can be predicted nicely (=
with efficiently encodable residual) with LPC coefficients 0.75; -0.375;
0.125; 0.5. To store these as integers, we multiply them by 8, and we get
7, -3, 1, 4. We also have to store a shift of +3 (2^3 = 8) so we get our
original LPC coefficients back.

For a "negative shift" to have a place, we would need the sum of the LPC
coefficients to between -0.5 and 0.5, which means it is a very quick
fade-out (which can only last a few samples). Probably one can synthesize
such signals, but looking at actual audio material, this does rarely
happen, especially with the larger blocksizes where non-fixed LPC
prediction shows its strengths.

Kind regards, Martijn van Beurden
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xiph.org/pipermail/flac-dev/attachments/20200625/ab79daa3/attachment.html>