[vorbis-dev] Using large-scale repetition in audio compression

Thu Sep 19 10:13:23 PDT 2002

On Thu, Sep 19, 2002 at 12:44:42PM +0200, Lourens Veen wrote:
> This idea is so simple that I'm sure it must have been thought of 
> before, and discarded, since AFAIK it's not used anywhere. I did a 
> quick web search but that didn't turn up much, so I figured I'd put 
> it up for discussion here anyway.

Thought about it before and postponed for lack of time and resources.

> How about using large-scale repetition in audio compression? I'm 
> thinking of redundancy in repeated pieces of a song, ie a chorus. 
> Ofcourse, the different choruses aren't exactly the same (unless it 
> was mixed digitally and they cheated :-)), but wouldn't there be at 
> least some redundancy in the frequency domain? And could that be 
> used to lower the required bitrate for repeated parts of a song?

Very similar to patterns in tracker files, huh?

In a sense an entire song should be self-similar in the frequency
domain -- after all, in most cases it's the same instruments playing a
lot of the same structures.

So then my thoughts came down to identifying the instruments used in
the song, extracting samples of them, and sort of templating them in
the time-frequency domain against the spectrum of the song and storing
that as pretty much a tracker-style format plus a residue track.

It turns out you can identify and extract individual instruments
pretty well (my algorithm didn't have the greatest results, but
others' have). The problem comes when you start combining them. Very
poor time resolution in most analysis methods (why do we block up the
MDCT transform in Vorbis anyway? ... exactly.), combined with harmonic
distortion from transient signals -- the attack is important in
identifying the instrument in many cases -- and nonlinear processing
of the signal in the many stages of recording ... it quickly turns out
to be a big mess. I think continuing with a classical transform,
e.g. the MDCT used in Vorbis, or even a wavelet-based method such as
the Matching Pursuit that I used, will result in a system where half
of your bits will be devoted to fixing up the mistakes made by the
other half.

I'm exploring new possibilities in time-frequency transforms, with
other goals right at the moment but I may get back to the audio
compression side later on.

> Ofcourse this is hard to do when streaming live audio, or even when 
> streaming from a fixed source if you don't buffer the entire 
> broadcast on the client side, but for compressing a song from a CD 
> I'd say that it would work.

In the above proposal, the streamer would send the known instruments
in the song thus far to the client before starting the main streaming,
much like Vorbis does with codebooks.

> Obviously, it's not used (at least AFAIK) so there must be something 
> against it. Anyone care to enlighten me?

... keep each other posted.

-- 
Kenneth Arnold <ken at arnoldnet.net>
- "Know thyself."

-------------- next part --------------
A non-text attachment was scrubbed...
Name: part
Type: application/pgp-signature
Size: 190 bytes
Desc: not available
Url : http://lists.xiph.org/pipermail/vorbis-dev/attachments/20020919/87ba9776/part-0001.pgp