[vorbis-dev] Understanding of Vorbis coder
Monty
xiphmont at xiph.org
Sat Sep 8 13:08:54 PDT 2001
> Simpeler and shorter: 'The input audio data is windowed before
> the MDCT is applied. The MDCT uses an overlap of 50%.'
Correct for same-sized blocks. It's more complex in transitions
between block sizes.
> As I understand it, the M in MDCT implies that you use some
> kind of overlap, so if you assume the reader knows what an MDCT
> is, there's no need to explain the need for overlapping.
> But perhaps: 'MDCT stands for Modified Discrete Cosine Transform.
> It transforms blocks of audio data from the time to the
> frequency domain. It uses an overlap between those blocks to
> be able to do this in a lossless manner.
A good practical definition; the exact overlapping and windowing has
some strict requirements to be an orthogonal MDCT, and that is implied
by 'MDCT' as you say.
> The decision whether to use a long or a short block is done before
> this by 4 parallel bandpass filters that detect energy surges.
...and this will be changing just before or just after 1.0; the
parallel bandpasses do not turn out to perform better than a simple
FFT or MDCT.
> In the graph, I'm not sure if the psychoacoustic model is
> in parallel with the windowing+MDCT. Since the psymodel needs
> frequency domain data I'd assume it works on the MDCT output
> too, but I'm not sure.
In series; it uses an FFT of the blocked data for tonal estimation and
an MDCT of it for noise analysis.
> <
> This block generates the Spectral envelope and it is called as
> floor curve. [..] This spectral envelope
> curve is represented by LPC coefficients
> >
LSP, not LPC, and this is only true of floor0. Floor0 has the
advantage of very good low bitrate perfromance, but is too unstable
for point/phase stereo coupling, so we use floor1 which works
differently.
>
> The most important goal of the psychoacoustics is to deteremine
> what is audible and what is not. That's totally missing here.
...In addition to determining the most graceful way to sacrifice
subjective quality to acheive desired bitrate :-)
> Both the floor curve coefficients and the residue are then fed
> to the VQ codebooks. They are not 'quantified and then encoded'.
> This is a single step inherent in the vector quantization.
Actually, there *is* some prequantization in rc2 and beyond for doing
multiple passes through a frame and progressively filling in detail.
But the final step does consist of quantize/code in one step.
Monty
--- >8 ----
List archives: http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body. No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.
More information about the Vorbis-dev
mailing list