[vorbis-dev] Understanding of Vorbis coder

Monty xiphmont at xiph.org
Sat Sep 8 13:08:54 PDT 2001



> Simpeler and shorter: 'The input audio data is windowed before
> the MDCT is applied. The MDCT uses an overlap of 50%.'

Correct for same-sized blocks.  It's more complex in transitions
between block sizes.

> As I understand it, the M in MDCT implies that you use some
> kind of overlap, so if you assume the reader knows what an MDCT
> is, there's no need to explain the need for overlapping.
> But perhaps: 'MDCT stands for Modified Discrete Cosine Transform.
> It transforms blocks of audio data from the time to the 
> frequency domain. It uses an overlap between those blocks to
> be able to do this in a lossless manner.

A good practical definition; the exact overlapping and windowing has
some strict requirements to be an orthogonal MDCT, and that is implied
by 'MDCT' as you say.

> The decision whether to use a long or a short block is done before
> this by 4 parallel bandpass filters that detect energy surges.

...and this will be changing just before or just after 1.0; the
parallel bandpasses do not turn out to perform better than a simple
FFT or MDCT.

> In the graph, I'm not sure if the psychoacoustic model is
> in parallel with the windowing+MDCT. Since the psymodel needs
> frequency domain data I'd assume it works on the MDCT output
> too, but I'm not sure.

In series; it uses an FFT of the blocked data for tonal estimation and
an MDCT of it for noise analysis.

> <
> This block generates the Spectral envelope and it is called as 
> floor curve. [..] This spectral envelope 
> curve is represented by LPC coefficients
> >

LSP, not LPC, and this is only true of floor0.  Floor0 has the
advantage of very good low bitrate perfromance, but is too unstable
for point/phase stereo coupling, so we use floor1 which works
differently.

> 
> The most important goal of the psychoacoustics is to deteremine
> what is audible and what is not. That's totally missing here.

...In addition to determining the most graceful way to sacrifice
subjective quality to acheive desired bitrate :-)

> Both the floor curve coefficients and the residue are then fed
> to the VQ codebooks. They are not 'quantified and then encoded'.
> This is a single step inherent in the vector quantization.

Actually, there *is* some prequantization in rc2 and beyond for doing
multiple passes through a frame and progressively filling in detail.
But the final step does consist of quantize/code in one step.

Monty

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.



More information about the Vorbis-dev mailing list