[vorbis-dev] encoder block diagram

stoffke at directbox.com stoffke at directbox.com
Fri Mar 14 06:28:01 PST 2003



I've made block diagram of the encoder because I tried to find out, how it  works

http://stoffke.freeshell.65535.net/ogg/block.html

Although there are specifiation docs, that give very
detailed information about single aspects of the encoding (or decoding) ,
I'm missing documenations that give a more general overview,
about how the encoder works.
(Vorbis Illuminated seems a bit outdated, as well as on2)

Here is a brief description of encoding process (as I understood it)

WINDOWING
- Vorbis uses overlapping windows with sizes between 64 and 8192 Samples (powers of two)
- short blocks and one long blocks are used (short blocks must be smaller or equal to long blocks), can be set to any allowed size
- selected window size depends on bitrate

MDCT
- transforms audio data to frequency domain

PSYCHOACOUSTIC MODEL
- Vorbis uses its own psychoacoustic model
- FFT for tonal analys and MDCT for noise analysis

Floor
- a psychoacoustic floor is created from the data, given from the
psychacoustic model
- the floor is a spectral envelope and represents a low resolution
model of the audio spectrum
- floor type 0 uses LSP and floortyp 1 a linear interpolation algorithm
to compute the floor curve
? currently only floor type 1 is used
? don't know whether the MDCT input for the psychoacoustic model come from  MDCT
above or an extra MDCT is performed (would that make sense at all ?)
- the floor data are then subtracted (amplitude-wise) from the MDCT data creating a "residue"
- the residue represents the spectral fine structure of the audio signal

CHANNEL COUPLING
- channel coupling reduces the redundacy of left and right channel
- it works good, because there's a high correlation between the floor curves of both channels
- Vorbis has different types of stereo models: dual stereo, lossless stereo (- q 6 to -q 10),
phase stereo and a mixed stereo (all the modes together)
? although vorbis supports up to 255 channels, there's no channel coupling  in streams
more than 2 channels (yet)
? not sure about the position of channel coupling in the diagram

VECTOR QUANTIZATION
- the floor data and the residues are vector quantized by using
custom codebooks
- codebooks are adaptive ( "trained" )

HUFFMAN
- the vector - codewords are then huffman-coded to minimize redundancy

finally the data are then packed into the bitstream

Please correct or comment the diagram and the description.

I'm not skilled in C , so I can't "read" the sourcecode.
But I tried to get the information from the specs,
and the mailing lists was also helpful.

I need information about vorbis for my diploma thesis.

Thanks a lot

Stoffke

<p>--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.



More information about the Vorbis-dev mailing list