[Flac-dev] Re: Lossless AMI ADPCM

Josh Coalson j_coalson at yahoo.com
Thu Jun 28 14:07:43 PDT 2001


I'm copying the flac-dev list to see if anyone has any
feedback also...

--- Juhana Sadeharju <kouhia at nic.funet.fi> wrote:
> Hello again. I had time to check the paper out. I have filled the
> steps given in the paper with formulae, and then written a piece of
> C code. It is not complete code, but could be a reasonable start.
> Maybe there is one typo in the paper -- I have pointed out it in
> my notes below -- please check. This is only encoder, and I don't
> know what Q, L and S exactly are -- perhaps Sox's AMI ADPCM code
> could tell that; I have not yet checked at it.
> [C pseudo-code snipped]

This is referring to the following paper:

ftp://ftp.funet.fi/pub/sci/audio/devel/newpapers/00871117.pdf

> What do you think about the algorithm given in the paper, and
> should we implement it to FLAC?
> 
> The paper writes about 1:5 compression ratios, but I'm not sure
> could it be true with pop music. For sure 1:4 ratio would be
> a great thing to have, compared to FLAC's 1:2 ratio.

I don't think this method is relevant to FLAC and here's why.

First, the results they show are for compression of data
that has already been lossily quantized to fewer bits per
sample, e.g. u-Law and A-Law are logarithmic quantizations
of 16-bit data to 8-bit.

Second, the average ratio (assuming the table describes
ratios, since the omitted the units) for 44.1kHz audio
is 3:1.  They only vaguely mention the sources for the
material.  I can choose material that gives those ratios
even for linear PCM.

Aside from that, their prediction idea is to use a very
large filter kernel (thousands of taps), and adapt the
kernel instead of transmitting it for each frame.  A
long kernel theoretically means more accurate prediction
because of the long time correlation, especially since
audio data is highly oversampled most of the time.  It
is apparent their method is geared for speech and I
think is not so good for general music compression, for
a few reasons:

1. Computing such a large filter is computationally
expensive.  Standard autocorrelation->Levinson-Durbin
will be too slow.  So they use RLS, which has stability
problems.

2. They can use Huffman encoding on the residual because
the alphabet is small (since the samples have already
been quantized down to 8 bits or less).  If you are
working with 16-bit or 24-bit data, generic Huffman is
not practical because of the dictionary size.  That's
why most (all?) such codecs use Rice coding.

I have done some tests with long kernels and it does
not buy very much extra compression.  Most of the slack
can be taken up with better entropy coding.

Josh


__________________________________________________
Do You Yahoo!?
Get personalized email addresses from Yahoo! Mail
http://personal.mail.yahoo.com/




More information about the Flac-dev mailing list