[vorbis-dev] Understanding of Vorbis coder
xiphmont at xiph.org
Sat Sep 8 13:08:54 PDT 2001
> Simpeler and shorter: 'The input audio data is windowed before
> the MDCT is applied. The MDCT uses an overlap of 50%.'
Correct for same-sized blocks. It's more complex in transitions
between block sizes.
> As I understand it, the M in MDCT implies that you use some
> kind of overlap, so if you assume the reader knows what an MDCT
> is, there's no need to explain the need for overlapping.
> But perhaps: 'MDCT stands for Modified Discrete Cosine Transform.
> It transforms blocks of audio data from the time to the
> frequency domain. It uses an overlap between those blocks to
> be able to do this in a lossless manner.
A good practical definition; the exact overlapping and windowing has
some strict requirements to be an orthogonal MDCT, and that is implied
by 'MDCT' as you say.
> The decision whether to use a long or a short block is done before
> this by 4 parallel bandpass filters that detect energy surges.
...and this will be changing just before or just after 1.0; the
parallel bandpasses do not turn out to perform better than a simple
FFT or MDCT.
> In the graph, I'm not sure if the psychoacoustic model is
> in parallel with the windowing+MDCT. Since the psymodel needs
> frequency domain data I'd assume it works on the MDCT output
> too, but I'm not sure.
In series; it uses an FFT of the blocked data for tonal estimation and
an MDCT of it for noise analysis.
> This block generates the Spectral envelope and it is called as
> floor curve. [..] This spectral envelope
> curve is represented by LPC coefficients
LSP, not LPC, and this is only true of floor0. Floor0 has the
advantage of very good low bitrate perfromance, but is too unstable
for point/phase stereo coupling, so we use floor1 which works
> The most important goal of the psychoacoustics is to deteremine
> what is audible and what is not. That's totally missing here.
...In addition to determining the most graceful way to sacrifice
subjective quality to acheive desired bitrate :-)
> Both the floor curve coefficients and the residue are then fed
> to the VQ codebooks. They are not 'quantified and then encoded'.
> This is a single step inherent in the vector quantization.
Actually, there *is* some prequantization in rc2 and beyond for doing
multiple passes through a frame and progressively filling in detail.
But the final step does consist of quantize/code in one step.
--- >8 ----
List archives: http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body. No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.
More information about the Vorbis-dev