[vorbis-dev] 2d wavelet transforms

jsr at dds.nl jsr at dds.nl
Sat Dec 30 10:45:18 PST 2000



I've been reading up on wavelet transforms the past week, 
and I plan to start on some video compression stuff next 
week, if it's any good (small chance :)) for Tarkin.

So far I think I know what's happening, however there's one 
small thing I don't quite understand yet. If I understand 
correctly, you can do a 2d wavelet transform (I'm assuming 
a Haar transform here for simplicity) by running a wavelet 
filter on all rows of the image first. That'll yield a new 
image which has the wavelet coefficients of the first run 
in the right half of the pixels, and a lower res (lowpass 
filtered) representation in the left half. You then run the 
same wavelet filter on the columns of the left half, etc.

So far so good, but now if you view the image as a 1D 
array, then at the end of the transform the coefficients 
are somewhat jumbled. Some ascii art:

image:     after 1st filter run:   after 2nd run:
 xxxx             xx11                 xx11
 xxxx             xx11                 xx11
 xxxx             xx11                 2211 
 xxxx             xx11                 2211

after final run:
 x311
 4311
 2211
 2211

So after the final run the 1D representation of the wavelet 
transform is x311431122112211. Is this correct, or should 
this be x433222211111111? I think it is correct, but I'd 
rather be sure (I already have a Haar wavelet 
implementation that build a binary tree, and then saves the 
coefficients walking the tree breadth-first, which yields 
the last format).

What I hope to do is do a wavelet-transform of each frame. 
If I understand correctly this should yield a 
rather "spiky" "graph", due to the characteristics of the 
data. Now I want to encode the difference in location and 
amplitude of these spikes in a smooth curve. Ofcourse this 
only works well if the changes happen to lie on a curve, so 
I'll have some staring to graphs to do. Anyway, if the 
second permutation of the coefficients is used I think this 
will work better. (the first one with a 2D curved surface 
to represent change instead of a 1D curve may work too 
though)

So, which on is the mathmatically correct one?

Oh, and another question I just remembered: audio signals 
are signed, because they're representations of the movement 
of the microphone. Video data does not have that, but the 
wavelet transform does expect signed data (or doesn't it? 
Haar works on unsigned data but what about Daubechies and 
other wavelets?). So how do I convert? Simply subtracting 
128 from each sample (ie casting to signed int) seems wrong.

Hope I didn't steal too much time from your efforts of 
getting it to work on Solaris :)

Cheers,

Lourens 

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.



More information about the Vorbis-dev mailing list