[vorbis-dev] another entropy coder that might be very useful
Yann Guidon
whygee at f-cpu.org
Mon Apr 26 18:44:03 PDT 2004
Sebastian Gesemann wrote:
>Hello, Yann !
>
>
Hallo !
warning, pointless ramblings ahead.
>Thanks for showing interest in optimizing Vorbis.
>
>
i have subscribed to this list some time ago already,
for different reasons, including my deep interest in everything
related to signal compression (lossless in particular).
And i wanted to implement a vorbis codec for ADi DSPs
(2191 or 21065L). But i have fallen into lurkdom during
the filename extension flamew^Wthread.
<p>>On Mon, 26 Apr 2004, Yann Guidon wrote:
>
>
>>Hello,
>>
>>I want to let you know about an algo that will soon be published
>>and that may be of interest to Vorbis (and other Ogg projects) :
>>
>>http://f-cpu.seul.org/whygee/ddj-3r/ddj-3r.tgz
>>
>>
>(Note for other users: This archive does not contain a
>subdirectory)
>
>
(yups, that happens sometimes, so i always tar tzvf an archive before
tar xzvf)
>Correct me, if I'm wrong. As I understand your algorithm,
>you suggest to sort all samples first, do some "folding" by adding
>a certain value to all samples so that they are positive, compress
>the sorted list of positive interegs with your RR algorithem and
>store some side-info about the permutation so that the original
>order can be restored.
>
>
Obviously, you didn't read the article completely.
The permutation trick is absolutely not viable
for the reasons i explain (overhad + complexity).
However (warning : side note !),
i have the gut feeling that a BWT
(that's another more complex permutation)
will give interesting results before a 3R step.
First, the "permutation overhead" is only log2(n)
instead of O(log2(n!)) so we only need 6 bits
for the "precursor pointer" (or "primary index") of a 64-samples block.
Furthermore, it has the property of grouping most high
values at the beginning, while leaving contiguous
zeros at the end, and 3R LOVES that :-)
But it's another story, i am trying to investigate
that but i can't, i have to earn a living :-/
>>>===<<<
However, before going further, think of 3R
as forked RRs used in a binary tree where
each node is the sum of both leaves/subnodes.
Just add all the input numbers together
in a binary tree fashion,
(look at http://f-cpu.seul.org/whygee/ddj-3r/figure7_j_tree.png)
then send nodes in the order shown in the drawing at
http://f-cpu.seul.org/whygee/ddj-3r/figure4.png
(you see, no permutation.)
the RR method is applied on each path from the root
(sum of all values) to each leaf (input data).
The nice properties of addition ("if a+b=c, then
you can deduce b with c-a") make one half
of the nodes useless for transmission (but not for temporary
computation) so we send N separate output values for
N input values.
The other properties of addition ensure that
you don't need to sort the (positive) numbers
and that the stop condition is unambiguous
(if XOR was used instead of +, then we couldn't
garantee that the end of a block can be detected).
Furthermore, using an adapted tree (see the
example where the Huffman algorithm is used
to build an optimal tree, given heuristic data),
the average overhead can be minimised.
Compared to common pyramidal methods,
which are usually transforms (the difference channel
must be entropy coded by a backend algorithm
such as Huffman or arithcodes), 3R is an encoder,
it outputs a compacted bitstream (but it does not
harm to preprocess the data with filters etc.).
Then, there are many places that can be optimised.
For example, i have implemented a real recursive function, but
it can be inlined or linearized (i would have done
it if i had more time, but the article was already
very very long and quite cryptic despite all my efforts).
>I don't see why this should be superior (in terms of compression
>efficiency and speed) to the entropy coding layer which is
>currently used. I also doubt that the RR step for coding the sorted
>list of positive integers is near optimal.
>
>
it is not.
RR is simply a "workbench", a necessary step in the explanation
of the properties of 3R. Understanding basic properties of RR can
help understand how to best understand and use 3R.
If i explained 3R directly, that would require too many
explanations at once (and i'm not an excellent teacher).
RR is handy because it allows a step-by-step explanation
and design of 3R. For example, it allows to introduce several
tools and concepts (such as early stop conditions and
phasing-in codes) before building the full algorithm.
Concerning the efficiency, i can't claim anything,
and i don't believe that 3R is the ultimate tool.
Each tool has a specific domain of application
and 3R does not replace, but complement
other algorithms where they don't work nicely.
For example, if the input data is /dev/random,
3R will add an overhead of 1 bit per data.
Huffman would probably not be much better,
it would try to optimise code lengths where
it is not useful because we know that the
entropy is 8b/value. Rice codes would
add log2(8)=3 bits par value. [All numbers
are means/averages]
On Gaussian or Laplacian distributed random values (for example),
Huffman takes a lead, followed by Gamma/Elias/Rice...
Unless there are very few values, where the overhead.
of transmitting the Huffman table is too high.
Then, there's this curious "class of dataset types"
with "some kind of correllation", or at least
that's what our central neural network between our ears
tell us. The entropy itself might be quite high
and few can be removed (it's not cyclic, it
can't be removed by FFT/DCT/DWT...)
but /there is something/.
3R was "found" when i tried to reduce
the size occupied by histograms, such as needed
for Huffman encoding of signals for example.
I needed to store/send the number of occurences
of each data sizes, and all obvious methods
were unsatisfying : a simple array is out of question
(too much wasted bits), Huffman is too complex
and inefficient for small lists (usually 16 to 32 values),
Rice encoding is underefficient because of some
sort of continuity, and delta-encoding just didn't
solve the problem.
I turned the problem and the numbers around
and around my head for a long while until what
i call today 3R appeared slowly. Then, nothing.
RR came only recently when decided to write the article.
By extension, 3R represents an interesting
complexity / efficiency compromise when
one has to encode small datasets with large variance,
like FFT power spectra of distinct sounds for example.
Or when small coefficient tables must be embedded
in files, it can then beat Rice encoding and some data
(could be JPEG quantisation tables) are too
small for huffman to be viable.
>Perhaps I missed something.
>
yup, but read the whole text and the small code examples
(and particularly http://f-cpu.seul.org/whygee/ddj-3r/3r_simple.c ).
>It would be nice of you if you can enlighten us.
>
i'll try to.
i simply hope that my emails don't take too much time
to read (this one took a lot of time to write, it remembers
me of the early F-CPU days :-D).
However, in emails, i have much more room to explain
and explicit things, because there are tight space
limitations on the article. I won't spend time re-re-re-re-re-re-reading
the text to remove one word here, a sentence there,
in order to fit a specific format.
>Did you successfully used this sheme as replacement
>for JPEG's entropy coding layer ? Or was it still theory ?
>
>
i don't have a JPEG development framework. Too much crappy
OSes on my computers and no time to play with them.
On top of that, writing a text like
http://f-cpu.seul.org/whygee/ddj-3r/ddj-3R.html
(plus all the drawings and developping the source codes)
already represents months of work without salary. I'm now trying to
clear all the issues
related to 3R before using it and doing other "time-costly" development
works.
in particular, i want to avoid any patent issue so i can go on.
the first application i have seen was in a (s)low-CPU video codec,
it's not a safe place to develop (too much patents and licences).
That is what motivated and justified that i write the article :
if there was a way to avoid paying the MPEG tax, that would
shake some grounds, even if my guesstimates feel a 20% efficiency
handicap (OTOH, MPEG has been developped and optimised
during years by the most advanced companies in the field).
But i will paraphrase what i say about F-CPU (warning, RMS beard sighted) :
"even if it is not as efficient, it is Free and this reason is a good
enough to use it."
Plus, i love the simplicity of the 3R method.
And i won't catch the xx % handicap alone,
that's why i turn to specialists^Wthis list.
>You might want to compare it to
>http://www.xiph.org/ogg/vorbis/doc/vorbis-spec-res.html
>It's a detailed describtion of how the residue vectors are coded.
>
too bad i never had time to read it all
(so we're even, he heh :-D)
>After the "floor-curve-subtraction" and quantization step the
>samples are mostly uncorrelated signed integers (*) with varying
>variance (higher variance in the low-freq part with some amplitude-
>peaks (tonals) and lower variance in higher frequency regions due
>to higher quantization.
>
>
looks interesting.
but it will take time before i come to a proposal.
without even a first sight, i think that there is still room for enhancement
there. Probably it will break many things in Vorbis but it can be worth
a try,
or even a thought. Here, i will investigate a system that is similar
to the lossless compressor i once envisonned, but with quantisation on
top of that.
First, i think that a BWT might help, but i have to investigate
that independently. BWT reorders data and might help 3R.
there are BWTs that are O(N) and the typical block size
is ok (will fit in L2 cache).
Then, given the knowledge of the quantization
parameters + the floor curve + the previous block's values,
a dynamic tree generation algorithm can also be used,
but it is exclusive with the use of BWT (because BWT
destroys the bins' order and makes local heuristics impossible).
The trick : a new tree is generated block per block,
as shown in http://f-cpu.seul.org/whygee/ddj-3r/figure7.html
but here, it's a 1D case (basic Huffman algorithm).
The heuristic data comprise the previous block's values,
the local floors and the quantization parameters.
Given these numbers, a block-specific tree is created
which is likely to be fitted to the current block.
If there is a continuous tone (for example, a voice),
the tree will remain mostly the same from block to block,
giving a decent "prediction" of the relative amplitude
of each DCT bin. It would be like computing the
Huffman table from the precedent block's data
and applying it to the current block, but without
the cost of the prefix codes at the output.
[of course, once in a while, a "non-relative" block
must be issued so that corrupted/missing blocks
don't break the whole file. And sometimes,
the predicted tree might not give a good result,
so a default tree is used instead. So each block
needs a "relative" flag so the decoder knows
what tree to use.]
Overall, it looks like the JPEG quantisation trick +
the dynamic tree generation of the lossless sound
compression method. crazy.
But from the little i guess, Vorbis uses a more complex
mechanism so what i describe above might not help directly.
>(* quantization and coding can be done simultaneously by VQ, but
>currently it's like scalar quantization to integers and coding them
>in groups with a specific set of codebooks)
>
>
it's interesting to see how available algorithms and tools
shape the data formats and processing.
when i discovered the Burrows-Wheeler Transform,
i saw that a complete new dataflow was created,
along with new methods and new maths.
It's only in last july that i remembered about this curious
encoder i had developped for a specific purpose.
I thought that writing an article about it would be
relatively easy and profitable (depending on the publisher)
but while writing it, i discovered that i was inventing
something new and simple. The article grew and
i spent much more time on this subject, and the article
is not yet published for reasons out of my control.
Maybe 3R will be simply used in a particular area of
Vorbis (say, for transmitting side-channels parameters
or something like that), or trigger a deeper rework of
the compression method. Here, unlike for video codecs
(where proprietary formats rule), the key point
is efficiency, so the goal now is to find places where
3R can help spare bits here and there.
Usually, i don't feel eager to code if a 10x increase in code
size/complexity means only 1% of size reduction.
Maybe both BWT and dynamic trees methods can be examined
in parallel, but i have heard/read somewhere that the
next Vorbis major version will use wavelets :
is it worth it to enhance the current Vorbis ?
>>I have no time to look closely at the Vorbis source code,
>>
>>
>You don't have to. I would suggest to inform yourself first about
>how the residue coding is done in Vorbis by checking the link I
>gave above.
>
>
i promised myself to do this long ago :-/
This was not eased by the fact that Vorbis uses some
terminology and methods that are not fond
in the mainstream litterature, there's really many
things to learn and it diverts me from my goal to
implement it or enhance it.
All i remember is that Vorbis encoding starts
almost like MP3, using variable 2^N sample
windows then uses a specific lapped DCT instead of FFT.
Then ... huh ...
<p>>Greets,
>Sebastian
>
>
YG
--- >8 ----
List archives: http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body. No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.
More information about the Vorbis-dev
mailing list