[vorbis] glossary?

Segher Boessenkool segher at wanadoo.nl
Thu Nov 30 15:13:37 PST 2000



> mdct - modified discrete cosine transform. Converts data from
> amplitude-vs-time to intensity-vs-frequency over a limited window. Closely
> related to the fourier transform.

If you want to find quality information, you better look for dct-IV;
that's what it is normally called (outside of the audio coding field).
A good source of information is the ResearchIndex.

> wavelets - another kind of transform, more general. Converts vs-time data
> to amplitude per frequency and "scale" with respect to some basis function
> (the "wavelet"). A big field, just search on the name to drown in
> descriptions.

The big difference is, wavelets get "smaller", time-wise, as frequency
increases; so one wavelet is the same number of wavelengths, independent
of the frequency it is at.

> 4th dimensions - I don't really know, but I assume this refers to the
> number of coefficients used in the vector quantization scheme, and thus
> the dimension of the corresponding encoding space.

I think he means my referral to the 4th dimension VQ codebook. So yes.
I should have put this on the -dev list, only the original posting was on
the normal list. I'm too lazy I guess.

> preecho - named for what it looks like. A general feature of
> fourier-family transforms is that sharp changes in frequency data (as can
> arise during the quantization process in lossy compression) result in
> extra wiggly bits in the orginal time/space domain. In jpeg image
> compression this artifact is called 'ringing' and looks like ripples
> around sharp edges. In audio, it looks like an echo of stong peaks,
> bluring sharp attacks. Echos after the attack just sound like extra
> fuzziness, but those before really muddy the sound in an unnatural way.

Actually, the phenomenon is: sharp changes in the _time_ domain make for
broad spectra spectra in the frequency domain, which don't get encoded well
enough (not enough bits), and the quantization error in the frequency domain
gets spread all over the time block, when decoded; some of this is masked
by psycho-acoustics, but the effect is better for post-echoes (more damping)
than for pre-echoes, which makes them more hearable. Changing encoding to
use shorter blocks just spreads the energy over a shorter time frame, which
will help a bit, but not _that_ much. But I'm digressing.

I think there are some usenet groups which can help you (comp.dsp or
something like it); these have FAQ's as well. But nothing beats reading
some books, and then reading more books, and then...

Ciao,

Segher
--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.



More information about the Vorbis mailing list