[Flac-dev] alternate compression

Didier Dambrin didid at skynet.be
Sat Aug 8 23:11:06 PDT 2009


I'm doing testing on this at the moment.

But to start with:
>>Because music represents an analog signal,
As I wrote, it would only apply to specific genres, not analog recordings. 
Electronic music quite often doesn't leave a computer these days. And it 
mainly consists of drums, synths & vocals/effects. Drums are often samples 
sequenced at sample (not sub-sample) accuracy, thus repeated (of course if 
the song was post-resampled, there will be sub-sample times). Synths are a 
problem, as the riffs will have more variations, and also free-running 
oscillators will give troubles.
But remember that it's not about finding perfect matches which will very 
rarely happen, just correlating signals to leave a residual, as long as it 
compensates (repeats enough) for the extra frame pool you'd have to store.

>>The repetition would not be genre-dependent, but would be tempo-dependent.
If would be very genre dependent for the reason I explained: samples. Say 
you have a drummer repeating a drumloop. If it's recorded, there's no chance 
the noise of the drums will correlate, it will change all the time. But if 
it's a drum sample, they will match. It's easy to correlate similar 
kickdrums, but that hardly works with cymbals.



Anyway, right now I get what I wanted somewhat working, well enough 
considering I've only spent a couple of hours.

I've done matching-frame-detection at small & big level. Small is just a 
couple of samples, but I also tried big tempo-synced blocks (this wasn't a 
problem, I have a good tempo detector, and btw tempo detection can also be 
done using cross-correlation in a similar matching-frame way).
It works (& takes ages btw), for ex I take a typical piece of music with a 
drumloop repeated & a varying synth line over it. It detects the repeated 
drumloop, subtracts it, so that for some parts there's just the synth line 
alone, without the drumloop anymore. And of course it totally fails for 
other songs I tested. So I'm sure it could end up working well with a lot 
more time.

..But sadly none of FLAC, WavPack or OptimFrog could compress the 
pre-processed song better, or hardly. And considering you'd also have to add 
the pool of frames, it would end up worse.

The problem is the discontinuities I think. Say you work with little, 
non-tempo-synced frames, and you find a matching frame, which you subtract 
from the song at the places it matches. You'll have a discontinuity around 
it. If the frames around this one also match, it doesn't matter as they will 
be subtracted as well. But if they don't (enough), the discontinuity will 
stay.
I also tried windowing the frame before subtracting it, no more 
discontinuity but with small frames it's not very useful anymore.

But if I run the pre-processing on something perfectly repeated several 
times, it really finds the frames, and it doesn't require knowing the tempo. 
If you don't know the tempo, the only problem will be misalignment, which 
will leave little bits of audio that were too short to find matching frames, 
but most of the processed waveform will still be silence.


So, I don't think I will test this further (it's the kind of thing you can 
spend months on to eventually give up), but I think it has potential, just 
maybe not coupled with existing lossless compression methods.
Afterall compression is about finding how data repeats, and music clearly 
repeats. It would also be useful for lossy compression, say you have a 
drummer playing 2x a loop, a lossy compressor could assume the second is the 
same as the first, even if it doesn't match perfectly. But really, I think a 
compressor should compress music at the level it repeats, and the current 
compressors seem to work at a smaller time scale.
(& in any case it would also require a huge compressing time, unless 
matching frames detection is done heavily in parallel using GPU maybe)


So the algo I tried is roughly:
-peek frames from the waveform, 1 by 1
-cross-correlate the frame with the rest of the waveform
-check the correlation result & whenever there's a strong match, look around 
it if there's not an even better match that's close enough
-for each match, subtract the frame from the waveform (tried with & without 
windowing). This may also be improved if you normalize the frame to the 
matching one (haven't tried)


Btw, are all lossless compression methods working in the time domain?




>> Btw, what do you think of this?
>> http://www.hydrogenaudio.org/forums/index.php?
>> s=95a0210a0ba3304eca44ac3bd57990cb&showtopic=73895
>> (didn't know where to post this, that forum seemed related)
>
> That article is very naive, or at least the way it is described is
> very naive.  Real music does not repeat in terms of whole frames.
> Frames are a completely artificial creation of the digital world, and
> frame timing does not correspond to the timing of music repetitions
> in music.  Because music represents an analog signal, the repetition
> could occur at a fraction of a frame, or even a fraction of a
> sample.  Compressing a drum loop would require a lot of tricks to
> detect the repetition unless the frame size were somehow luckily
> aligned with the tempo.  Maybe a song with 70.3125 BPM or 140.625 BPM
> could be compressed this way, but most music will not have such a
> precise tempo - in fact, tempo may drift if a live band is recorded.
>
>
>> So I thought: imagine a pre-processing coupled with FLAC. It would
>> take
>> frames out of the whole song, and try to cross-correlate them with
>> the song
>> itself. When it finds strong matches (under a certain threshold, and
>> starting with a couple of matches), the frame is saved to a pool,
>> and it's
>> subtracted from the song.
>> Then you FLAC the (small) pool, and the song, full of near-silent
>> spots (&
>> silence where pure repetitions occured).
>> At decode time, you unFLAC the pool and the song, and you add back the
>> frames from the pool to the song.
> This might work, but you would have to be very lucky to find matches
> given the block size of FLAC (or the frame size of any format, for
> that matter).  But, you're right, if you can predict the waveform
> with reasonable accuracy, then you can reduce the size.
>
> FLAC and many other compression algorithms do, in fact, use this
> technique.  They look at the music, predict future samples, and then
> encode the difference between the predicted value and the actual value.
>
> It's doubtful that you could find a better algorithm at predicting
> the waveform, but if you do, then FLAC will work well with your added
> processing layer.
>
>
>> I haven't experimented yet, but let's say I try to correlate frames
>> with the
>> song, and I get something like 20 near-repeats, I may end up with a
>> very
>> silent "song leftover", still as long as the song, but maybe in
>> 4bits worth
>> or something? But it would also have bumps of original audio (that
>> didn't
>> find any matching frame).
>> The thing is, I don't really know how FLAC compresses so I don't
>> know if it
>> would compress the "leftover" so much better.
> It's doubtful that you could find such repetition, given that the
> frame size has nothing to do with the tempo of the song, and
> repetition in music are based on tempo.  But, if you could find a
> match or even a near match, then FLAC would compress the difference
> better than the original.
>
>> And I don't really know how much matching frames you'd find out in
>> music out
>> there, it would be very genre-dependent. But I'm surprised that no one
>> really investigated this (there were old discussions in that
>> forum). Sure,
>> streaming is important, but it's common to fully download a song.
> The repetition would not be genre-dependent, but would be tempo-
> dependent.  I suppose you could say that certain genres might have a
> prevalent tempo, but there is enough variation within each genre to
> make the problem as big as non-genre-dependent matching.
>
> People have investigated this, but perhaps not at the macro level as
> is being discussed here.  I think you'll discover that finding a
> match within a song is very difficult.  You could perhaps start with
> BPM detection code, and then try to find repetitions based upon
> tempo, but even if you find matches this way, you still would need to
> find some way to squeeze the repetitions into whole frames, which are
> not divisions of tempo.
>
> Feel free to experiment.  The FLAC library makes it possible for you
> to work at the high level without writing everything yourself.
>
>> At the same time this wouldn't be very interesting for my need,
>> which is to
>> compress short samples. Now here too there could be a similar algo,
>> if it's
>> tonal, cross-correlation would detect matching frames, only at a
>> smaller
>> level. Imagine if you convert a violin sound into a pitch period
>> somewhere
>> in its middle, and the residual from that the subtraction of that
>> pitch
>> period in repeated frames. I think the residual would be rather quiet.
>
> If you're going to use the primary violin sounds middle pitch as the
> predictor, then you need a way for your encoder and decoder to find
> the exact same waveform.  If you can do this in a way that your
> decoder could discover the predicted values, then FLAC would be a
> successful way to compress the residual.
>
> Brian Willoughby
> Sound Consulting
>


--------------------------------------------------------------------------------



No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 8.5.392 / Virus Database: 270.13.47/2290 - Release Date: 08/08/09 
06:10:00



More information about the Flac-dev mailing list