[Flac-dev] floating point

Sat Aug 8 22:02:02 PDT 2009

On Aug 7, 2009, at 21:48, Didier Dambrin wrote:
> FLAC doesn't preserve every chunk? I thought it did. I only gave a  
> quick try
> but it seemed to have preserved even the most obscure chunks.
> Let me check: it even seems to preserve "MIDI note associated to  
> marker",
> which is a very unknown metadata used by SoundForge (& even defined  
> in a
> buggy way), so I assumed it was saving them transparently.

You are correct, Didier.  FLAC preserve every chunk, precisely.  WAV  
and AIFF define a chunk in a very generic fashion, such that any  
chunk can be preserved regardless of its contents.  FLAC does not  
interpret any chunk except the one holding the audio data.  The  
optional chunk preserving code does not treat any chunk differently,  
thus it cannot preserve some chunks and not others.  I think that  
Martin is speaking from out of date experience.

> Btw, what do you think of this?
> http://www.hydrogenaudio.org/forums/index.php? 
> s=95a0210a0ba3304eca44ac3bd57990cb&showtopic=73895
> (didn't know where to post this, that forum seemed related)

That article is very naive, or at least the way it is described is  
very naive.  Real music does not repeat in terms of whole frames.   
Frames are a completely artificial creation of the digital world, and  
frame timing does not correspond to the timing of music repetitions  
in music.  Because music represents an analog signal, the repetition  
could occur at a fraction of a frame, or even a fraction of a  
sample.  Compressing a drum loop would require a lot of tricks to  
detect the repetition unless the frame size were somehow luckily  
aligned with the tempo.  Maybe a song with 70.3125 BPM or 140.625 BPM  
could be compressed this way, but most music will not have such a  
precise tempo - in fact, tempo may drift if a live band is recorded.

> So I thought: imagine a pre-processing coupled with FLAC. It would  
> take
> frames out of the whole song, and try to cross-correlate them with  
> the song
> itself. When it finds strong matches (under a certain threshold, and
> starting with a couple of matches), the frame is saved to a pool,  
> and it's
> subtracted from the song.
> Then you FLAC the (small) pool, and the song, full of near-silent  
> spots (&
> silence where pure repetitions occured).
> At decode time, you unFLAC the pool and the song, and you add back the
> frames from the pool to the song.
This might work, but you would have to be very lucky to find matches  
given the block size of FLAC (or the frame size of any format, for  
that matter).  But, you're right, if you can predict the waveform  
with reasonable accuracy, then you can reduce the size.

FLAC and many other compression algorithms do, in fact, use this  
technique.  They look at the music, predict future samples, and then  
encode the difference between the predicted value and the actual value.

It's doubtful that you could find a better algorithm at predicting  
the waveform, but if you do, then FLAC will work well with your added  
processing layer.

> I haven't experimented yet, but let's say I try to correlate frames  
> with the
> song, and I get something like 20 near-repeats, I may end up with a  
> very
> silent "song leftover", still as long as the song, but maybe in  
> 4bits worth
> or something? But it would also have bumps of original audio (that  
> didn't
> find any matching frame).
> The thing is, I don't really know how FLAC compresses so I don't  
> know if it
> would compress the "leftover" so much better.
It's doubtful that you could find such repetition, given that the  
frame size has nothing to do with the tempo of the song, and  
repetition in music are based on tempo.  But, if you could find a  
match or even a near match, then FLAC would compress the difference  
better than the original.

> And I don't really know how much matching frames you'd find out in  
> music out
> there, it would be very genre-dependent. But I'm surprised that no one
> really investigated this (there were old discussions in that  
> forum). Sure,
> streaming is important, but it's common to fully download a song.
The repetition would not be genre-dependent, but would be tempo- 
dependent.  I suppose you could say that certain genres might have a  
prevalent tempo, but there is enough variation within each genre to  
make the problem as big as non-genre-dependent matching.

People have investigated this, but perhaps not at the macro level as  
is being discussed here.  I think you'll discover that finding a  
match within a song is very difficult.  You could perhaps start with  
BPM detection code, and then try to find repetitions based upon  
tempo, but even if you find matches this way, you still would need to  
find some way to squeeze the repetitions into whole frames, which are  
not divisions of tempo.

Feel free to experiment.  The FLAC library makes it possible for you  
to work at the high level without writing everything yourself.

> At the same time this wouldn't be very interesting for my need,  
> which is to
> compress short samples. Now here too there could be a similar algo,  
> if it's
> tonal, cross-correlation would detect matching frames, only at a  
> smaller
> level. Imagine if you convert a violin sound into a pitch period  
> somewhere
> in its middle, and the residual from that the subtraction of that  
> pitch
> period in repeated frames. I think the residual would be rather quiet.

If you're going to use the primary violin sounds middle pitch as the  
predictor, then you need a way for your encoder and decoder to find  
the exact same waveform.  If you can do this in a way that your  
decoder could discover the predicted values, then FLAC would be a  
successful way to compress the residual.

Brian Willoughby
Sound Consulting