[vorbis] encoder block diagram
stoffke at directbox.com
stoffke at directbox.com
Wed Mar 12 13:58:08 PST 2003
I've made a block diagram of the encoder because I tried to find out, how it works
http://stoffke.freeshell.65535.net/ogg/block.html
Although there are specifiation docus, that give very
detailed information about single aspects of the encoding (or decoding) ,
I'm missing documenations that give a more general overview,
about how the encoder works.
(Vorbis Illuminated seems a bit outdated, as well as on2)
Here is a brief description of encoding process (as I understood it)
WINDOWING
- Vorbis uses overlapping windows with sizes between 64 and 8192 Samples (powers of two)
- short blocks and one long blocks are used (short blocks must be smaller or equal to long blocks), can be set to any allowed size
- selected window size depends on bitrate
MDCT
- transforms audio data to frequency domain
PSYCHOACOUSTIC MODEL
- Vorbis uses its own psychoacoustic model
- FFT for tonal analys and MDCT for noise analysis
Floor
- a psychoacoustic floor is created from the data, given from the
ps. model
- the floor is a spectral envelope and represents a low resolution
model of the audio spectrum
- floor type 0 uses LSP and floortyp 1 a linear interpolation algorithm
to compute the floor curve
? currently only floor type 1 is used
? don't know whether the MDCT input for the psychoacoustic model come from MDCT
above or an extra MDCT is performed (would that make sense at all ?)
- the floor data are then subtracted (amplitude-wise) from the MDCT data creating a "residue"
- the residue represents the spectral fine structure of the audio signal
CHANNEL COUPLING
- channel coupling reduces the redundacy of left and right channel
- it works good, because there's a high correlation between the floor curves of both channels
- Vorbis has different types of stereo models: dual stereo, lossless stereo (- q 6 to -q 10),
phase stereo and a mixed stereo (all the modes together)
? although vorbis supports up to 255 channels, there's no channel coupling in streams
more than 2 channels (yet)
? not sure about the position of channel coupling in the diagram
VECTOR QUANTIZATION
- the floor data and the residues are vector quantized by using
custom codebooks
- codebooks are adaptive and are "trained"
HUFFMAN
- the vector - codewords are then huffman-coded to minimize redundancy
finally the data are then packed into a bitstream
Please correct or comment the diagram and the description.
I'm not skilled in C , so I can't "read" the sourcecode.
But I tried to get the information from the specs,
and the mailing lists was also helpful.
I need information about vorbis for my diploma thesis.
Thanks a lot
Stoffke
<p>--- >8 ----
List archives: http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-request at xiph.org'
containing only the word 'unsubscribe' in the body. No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.
More information about the Vorbis
mailing list