# [vorbis-dev] 2d wavelet transforms

jsr at dds.nl jsr at dds.nl
Sat Dec 30 10:45:18 PST 2000

```
I've been reading up on wavelet transforms the past week,
and I plan to start on some video compression stuff next
week, if it's any good (small chance :)) for Tarkin.

So far I think I know what's happening, however there's one
small thing I don't quite understand yet. If I understand
correctly, you can do a 2d wavelet transform (I'm assuming
a Haar transform here for simplicity) by running a wavelet
filter on all rows of the image first. That'll yield a new
image which has the wavelet coefficients of the first run
in the right half of the pixels, and a lower res (lowpass
filtered) representation in the left half. You then run the
same wavelet filter on the columns of the left half, etc.

So far so good, but now if you view the image as a 1D
array, then at the end of the transform the coefficients
are somewhat jumbled. Some ascii art:

image:     after 1st filter run:   after 2nd run:
xxxx             xx11                 xx11
xxxx             xx11                 xx11
xxxx             xx11                 2211
xxxx             xx11                 2211

after final run:
x311
4311
2211
2211

So after the final run the 1D representation of the wavelet
transform is x311431122112211. Is this correct, or should
this be x433222211111111? I think it is correct, but I'd
rather be sure (I already have a Haar wavelet
implementation that build a binary tree, and then saves the
coefficients walking the tree breadth-first, which yields
the last format).

What I hope to do is do a wavelet-transform of each frame.
If I understand correctly this should yield a
rather "spiky" "graph", due to the characteristics of the
data. Now I want to encode the difference in location and
amplitude of these spikes in a smooth curve. Ofcourse this
only works well if the changes happen to lie on a curve, so
I'll have some staring to graphs to do. Anyway, if the
second permutation of the coefficients is used I think this
will work better. (the first one with a 2D curved surface
to represent change instead of a 1D curve may work too
though)

So, which on is the mathmatically correct one?

Oh, and another question I just remembered: audio signals
are signed, because they're representations of the movement
of the microphone. Video data does not have that, but the
wavelet transform does expect signed data (or doesn't it?
Haar works on unsigned data but what about Daubechies and
other wavelets?). So how do I convert? Simply subtracting
128 from each sample (ie casting to signed int) seems wrong.

Hope I didn't steal too much time from your efforts of
getting it to work on Solaris :)

Cheers,

Lourens

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.

```