[vorbis-dev] another entropy coder that might be very useful
Yann Guidon
whygee at f-cpu.org
Wed Apr 28 15:23:59 PDT 2004
Sebastian Gesemann wrote:
>Hi Yann,
>
>On Tue, 27 Apr 2004, Yann Guidon wrote:
>
>
>>Hallo !
>>
>>
>:)
>
>
auch :-)
>>Obviously, you didn't read the article completely.
>>
>>
>You got me!
>
>
as written before, we're even :-/
>>However (warning : side note !),
>>i have the gut feeling that a BWT
>>(that's another more complex permutation)
>>will give interesting results before a 3R step.
>>First, the "permutation overhead" is only log2(n)
>>
>>
>Actually, the BWT is a well-defined bijective transform.
>No side information is needed to restore the original order.
>
>
the "overhead", or "sided information" is the primary index,
the index to the first element, so it's log2(N).
sorry for being obscure, the "i understand myself well"
always strikes me back. I think too fast sometimes and
can't write everything :-)
>>instead of O(log2(n!)) so we only need 6 bits
>>for the "precursor pointer" (or "primary index") of a 64-samples block.
>>Furthermore, it has the property of grouping most high
>>values at the beginning, while leaving contiguous
>>zeros at the end, and 3R LOVES that :-)
>>
>>
>Are we discussing a "3R general purpose compressor" or does
>it still have something to do with Vorbis ?
>
>
here it was a side note on using 3R as a replacement for
MTF+RLE+Huffman/arithcoded
after a BWT, in the context of the compaction of spectra for FFT for
example.
sorry for this digression.
>IMHO the BWT is only a good idea if there's strong correlation
>between successive (discrete) symbols in the data like letters in
>english text for example.
>
My "challenge" here is that the "only" is probably too conservative.
I would like to verify (if not proove) that BWT can also have
some benefit on some kinds of signals. If you know papers
that investigate that, please give me a pointer. I have only found
papers where BWT is applied on alphabets/symbol streams,
they proove without a doubt that BWT can extract
intersymbol correllations.
However, some studies also deal with alphabet reordering,
showing that the alphabet mapping can
be modified to boost the compression ratio. That is :
ASCII is not optimal when using BWT on texts.
This means that the symbol's values also have a impact
on the coding efficiency. Remember that the BWT
is based on "sorting values", even if the output is a
different representation, the ordering is done with a sort
operation. Audio signals are by definition continuous
and therefore sensitive/susceptible to this operation.
Finally, some of my experimental tests using bzip2 on
preprocessed signals (bytestreams generated out of
raw CD tracks) show that it can achieve very good
performance. So this motivates my investigations.
I think it applies to time-domain as well as frequency-domain.
In the other (example) case of encoding the spectrum of a square signal,
where the spectrum shows equally spaced peaks,
the BWT should be able to catch these repetitions,
event if it is mixed with other types of signals/spectra.
Of course, here, it is still a working hypothesis.
And the general coding gain must be proven to be
advantageous (that means : a lot of work).
(end of this BWT sidenote)
>Because of the MDCT coefficients' nature
>(they're mostly *uncorrelated*) it does not make much sense to use
>a BWT.
>
>
(M)DCT is several dB better than FFT to decorrellate
signals, i know. But then, you say yourself "mostly uncorrelated",
and compression is about finding correlation.
i've "tinkered" with several transforms, including DCT and FFT
(for example, loot at http://f-cpu.seul.org/whygee/dct_fc0/dct_fc0.html )
and due to my interest in lossless compression, i apreciate the BinDCT
family of transforms (and the principle can be extended to "binFFT"
easily). Ouch, sorry for this digression.
Now, in lossy compression, if you quantize and apply perceptual data
reduction,
don't say that it is uncorrelated, because it adds some known informations
on top of "mostly *uncorrelated*" informations :-)
i guess that the remaining steps of Vorbis are very smart and well designed,
though unfortunately not easy to explain, so i'll stop here by fear of
saying
things that don't apply to our context.
>I've the impression that you're tinkering around general-purpose
>compression algos thinking they could be useful for Vorbis.
>
why not ? :-)
i'm not a Vorbis guru, that's why i am triyng to discuss about this subject.
>IMHO it's important to first understand the nature of data you
>want to store compactly.
>
>
of course.
unfortunately, Vorbis is far from easy to understand
and there's a huge (mind)gap between lossy and lossless
algorithms. And i think that 3R is useless for FLAC
(and i don't like LPC either)
>>[...]
>>Compared to common pyramidal methods,
>>which are usually transforms (the difference channel
>>must be entropy coded by a backend algorithm
>>such as Huffman or arithcodes), 3R is an encoder,
>>it outputs a compacted bitstream (but it does not
>>harm to preprocess the data with filters etc.).
>>[...]
>>
>>
>I'm not following you.
>
>
This means that i have seen several "compression algorithms"
using recursive splitting of the data set, several as "toy algorithms"
for image compression for example (so not in mainstream litterature).
The diffierence here is that 3R does output compacted bitstreams,
it does not require Huffman or arithcodes.
>>[...]
>>For example, if the input data is /dev/random,
>>3R will add an overhead of 1 bit per data.
>>Huffman would probably not be much better,
>>it would try to optimise code lengths where
>>it is not useful because we know that the
>>entropy is 8b/value.
>>Rice codes would
>>add log2(8)=3 bits par value. [All numbers
>>are means/averages]
>>
>>
>How can you expect a decent performance of rice
>codes if you try to compress uniformly distributed
>random bytes ? What's the point ?
>
>
showing differences of behaviour and of efficiency vs complexity ratio.
Also, 3R doesn't behaves like crazy when used on "non ideal data"
(or worst case scenarios).
<p>>[snipped a lot]
>
>
thanks ;-)
>>By extension, 3R represents an interesting
>>complexity / efficiency compromise when
>>one has to encode small datasets with large variance,
>>like FFT power spectra of distinct sounds for example.
>>Or when small coefficient tables must be embedded
>>in files, it can then beat Rice encoding and some data
>>(could be JPEG quantisation tables) are too
>>small for huffman to be viable.
>>
>>
>Rice coding is just one tool that can be used in
>conjunction with others to build a coding system that
>suits your type of data. I've absolutely no idea
>how you're able to compare 3R to "Rice coding".
>
>
>Rice coding is a simple method that matches a certain
>distribution model without exploiting inter symbol
>correlations.
>
3R is not much more complicated than Rice, doesn't need explicit statistical
modeling (and a few other things). So, it can be worth using it instead
of Rice/Gamma/Elias
codes (or others) "in certain situations" (that remain to be found).
>>>Perhaps I missed something.
>>>
>>>
>>yup, but read the whole text and the small code examples
>>(and particularly http://f-cpu.seul.org/whygee/ddj-3r/3r_simple.c ).
>>
>>
>Hmm... Sorry... To be honest: I'm currently not convinced that
>it is of use for any Xiph CoDec. Among other things the Vorbis I
>format is already frozen.
>
>
who knows if it can't help anyway ?
i know that a wavelet-based Vorbis was in the air, for example.
So there may still be places in Xiph world for 3R.
>>"even if it is not as efficient, it is Free and this reason is a good
>>enough to use it."
>>
>>
>Uhhh...
>
>
don't tell me you love proprietary architectures ;-)
>>Plus, i love the simplicity of the 3R method.
>>
>>
>Is it ? :-P
>;)
>
>
if you can simplify it, show me how :-)
on top of that, it can be enhanced in many ways,
often using known methods.
>>>[description of Vorbis' residue vectors' sample distribution]
>>>
>>>
>>First, i think that a BWT might help, but i have to investigate
>>that independently.
>>
>>
>I don't want to sound rude, but you're obviously not that
>experienced in the field of "source coding".
>(see note on BTW above)
>
>
i've never said i'm a compression specialist.
i've worked in many areas of data processing, from transistor
to high-level applications (including some DSP, scientific
computing and other funny electronics stuffs). So my knowledge
is more "general culture" than "mastering". But, like
any domain, there's still room for enhancements
so i don't hesitate if i know that i can (try to) develop
a new method/algo/approach. Then, i can know
if it's valid only if i test it.
>[huge portions snipped]
>
>Wow...
>I dont't wanna know how much time you spent on writing all
>this. I appreciate your efforts... but IMHO... you should have
>focused your energy on different things like understanding
>Vorbis first and how the entropy coding layer currently works.
>
>
did you read my article entirely ?
i guess that the whole Vorbis documentation (and we know
that Vorbis is not perfectly documented) is at least 10x larger than
the article i wrote. Vorbis is much more sophisticated than 3R.
So please excuse me if i don't have the time to understand
all Vorbis' subtleties.
>I once had a discussion with a guy who wanted to implement
>a video codec with superior effeciency compared to DivX and
>stuff... First, he said that he has some fresh new ideas with
>which he could revolutionize the video compression world. Then
>he came up with transforming RGB data to HSL and
>one-dimensional Fourier transforms. He even didn't know WHY
>he used the HSL colorspace and a Fourier transform. (At least
>he could not answer that properly.)
>
Maybe after he spent time (years ?) on it, he realised that there were
better ways, but they involve highly complex concepts. Standards like
MPEG and JPEG are the products of years of studies by thousands
of skilled engineers, so it's not likely that i come up alone with a
magic formula
that will revolutionize everything.
But then, who can claim to have universal knowledge ?
we learn everyday.
>In another reply he said that his sheme still needs some
>optimization because the compressed version is 4 times larger
>than the original.
>
>Funny story, isn't it ?
>
>
this happens a lot, as can be seen on comp.compression.
The first time (long ago) i typed a compression program
(a Huffman source i found in a Turbo Pascal book),
the output was larger than the input, probably due
to a typo.
Please note that i didn't make high unreasonable claims.
I have examined 3R's computational complexity
and bounded its behaviour (best and worst cases).
Now, our real issue is to see where it can be used,
besides the applications i have in mind.
>(I hoping that you don't count this story as an insult to your
>intellect or take it personally. It's just that the field of
>lossy audio/video coding is very far from being trivial enough
>to come up with a superior sheme within a few hours.)
>
>
that's what i write a few paragraphs above.
you see, i'm not Jules Gilbert, and if you take
some tens of minutes to read the article then read
and compile the code, you'll see predictable results.
And, according to several specialists i have contacted,
no algorithms looking like 3R are in use today.
Now, i have to remark that the tutorial URL you
gave me didn't explain anything useful or answered
my questions. It simply doesn't deal with what i expected.
i would need a more concise, more abstract flowchart,
where and what data are and do... Should fit in a few pages.
Thanks again for your answers, may there be more :-)
>Greets,
>Sebastian
>
>--
>PGP-Key-ID (long): 572B1778A4CA0707
>
>--- >8 ----
>List archives: http://www.xiph.org/archives/
>Ogg project homepage: http://www.xiph.org/ogg/
>To unsubscribe from this list, send a message to 'vorbis-dev-request at xiph.org'
>containing only the word 'unsubscribe' in the body. No subject is needed.
>Unsubscribe messages sent to the list will be ignored/filtered.
>
>
>
>
--- >8 ----
List archives: http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body. No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.
More information about the Vorbis-dev
mailing list