The integer wavelet codec, and Re: [vorbis-dev] A different take at video encoding - I'm stuck though

Mon Feb 12 08:58:18 PST 2001

On Sun, 11 Feb 2001, Kenneth Arnold wrote:

> Obviously having the as fast as possible is a good thing, but to what
> extent do you sacrifice compression for speed?

I think, the following compromise would be ok: compress better than MPEG,
WMA, DivX, OpenDivX & Co. while still providing realtime encoding and
decoding. Isn't it ?

>                                                In my opinion the
> codec, or at least bitstream format, should be flexible enough to
> handle everything from storing raw RGB / YUV data, through fast but
> not as efficient lossless, slow but small lossless, MPEG-style DCT and
> motion compensation (where possible while avoiding the patent issues),
> to your single-pass wavelet, then adding textures, to various levels
> of Lourens's polygon-based coder. Getting a bitstream format to
> accomodate all of those, possibly simultaneously, has been what I've
> been keeping myself busy with.

The Bitstream format is ogg, this can handle all those cases. The other
things are the compression codecs, here you can plug in OpenDivX (The
MPEG style DCT Coder), a polygon based coder, a vector animation format,
whatever you want ...

> And for Holger, and anyone else who knows these things: thanks to some
> of the good tutorials and references several people on this list have
> referred to, I now have a decent idea as to how wavelets work. My
> question now is how to apply them. I've been looking (studying?
> pondering?) your code, and though I think I understand what you are
> doing, I don't get how it all fits together. I have specific
> questions, but in a general sense (and this could answer all of them
> at once): How do you actually apply wavelets to real data in more than
> one dimension? Perhaps this is elementary for the author, but none of
> the references I have read cover that. Once I understand everything
> involved in the coder, perhaps I can write up something to describe it
> to the less math-inclined?

Humm. Let me try to describe at the example of 2 dimensions:

You could perform a line decomposition: Apply the transform for all scales 
in x-direction for every row. The left column now contains the DC
coefficients of each row. Now you apply the transform in y-direction, so
you get a single DC coefficient in the top-left pixel.

On the other hand you could perform a full decomposition by applying the
transform first in the x direction for all rows, then in y-direction for
all columns. This makes coefficients much smaller so you can compress them
more efficient then those of the line decomposition.

Our approach (called Standard decomposition in literature) is between
these two extremes: first apply the transform (for only one scale) in
x-direction. Then (for one scale, too) in y-direction. Now you have
splitted up the image in 4 quarters:

   +----+----+
   | LL | HL |
   +----+----+
   | LH | HH |
   +----+----+

Where LL means low pass filtered in both directions, HH highpass filtered
in both directions. Then you do the same for the LL Quarter again and
again until the size of the lowpass filtered quarter is a single pixel.
This is computationally cheaper than the full decompostion while still
producing quite small coefficients.

> Also, as an implementation question, what are you doing with "scales"?
> It seems to be a rough upper bound on the maximum log base 2 of the
> width, height, and number of frames... am I missing something or is
> this part of multidimensional wavelets? And though I think I get what
> you're doing in __fwd_xform__ (just not the algorithm used to do it),
> I don't get what it's returning -- the min and max of what? and why?
> I haven't actually implemented a wavelet myself, though, so this could
> be pretty obvious.

The minmax value is a bitmask, which describes the first significand bit
of the coefficient with the largest value in a specific scale. This is
later used to skip empty bitplanes in the coefficient encoder.

The decoder needs this value, too.

> I'm currently running the compiled code on a 400-frame sequence of
> 352x288 frames. Aside from the fact that it ate almost all of my 256
> MB of memory in one fell swoop (and is still partially in swap but
> fortunately not thrashing), it's still working on it. I had no idea
> what to set the Y U and V bit cutoffs to, so I just picked some
> arbitrary numbers to try it out; any suggestions?

Have you set N_FRAMES to a number between 1 and 16 ??  this is the number
of frames in a block, which is compressed at once. The limits for
bitstreams are something you can play with -- numbers between 1000
Bytes/frame and 5000 Bytes/frame for the Y bitstream make sense. The U and
V bitstream can be subsampled by a factor between 4 and 20.

> I do notice one probably troubling thing with the implementation here:
> you're coding Y U and V independantly (unless I'm missing something
> big). The problem is that Y U and V are not independant; in my test
> images, Y U and V have at least the same or very similar edges; often
> the values are highly correlated also (this should be obvious to
> anyone working with imaging). What to do about this? Could you expand
> the wavelet to another dimension, i.e. predict the U and V data of a
> frame from its Y data?

Good point. I didn't thought about this yet ...

- Holger

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.