[vorbis-dev] Understanding of Vorbis coder

Gian-Carlo Pascutto gcp at sjeng.org
Wed Sep 5 06:53:14 PDT 2001



At 13:44 5/09/2001 +0530, you wrote:
>Hi
>I have gone through the document available in the net regarding the
>Vorbis encoder /Decoder.
>Based on that i have prepared a understanding document on the
>encoder/decoder block. I would like to
>know whether my understanding of the coder is OK. If there are any
>other  additional block  /information pl. provide me
>with the same.

First things first: please use something else besides Microsoft
Word format. It's usually possible to extract the text on a Unix
box but that's about it. HTML would be a lot better. And you could
link to it so you don't have to dump it into the mailinglist each
time you make an update.

Now, on the document:

(note, what I state below may very well be wrong at times. If so, 
please correct!)

>
'Input speech signal' 
>

Vorbis handles a lot more than only speech

>
Instead of performing Sub Banding on the time domain data before 
MDCT in ogg Vorbis they perform Windowing of the input speech signal.
Windows are overlapped to reduce undesirable distortion that would 
occur with non-overlapping, adjacent windows.Vorbis uses windows 
of two sizes, called short and long. The sizes must be powers of two.
>

Mentioning subbanding here is not needed as it's not used anyway
and will probably only add confusion (what is subbanding?). 

Simpeler and shorter: 'The input audio data is windowed before
the MDCT is applied. The MDCT uses an overlap of 50%.'
As I understand it, the M in MDCT implies that you use some
kind of overlap, so if you assume the reader knows what an MDCT
is, there's no need to explain the need for overlapping.
But perhaps: 'MDCT stands for Modified Discrete Cosine Transform.
It transforms blocks of audio data from the time to the 
frequency domain. It uses an overlap between those blocks to
be able to do this in a lossless manner.

The windowing+MDCT are really closely related steps. Windowing
is NOT the same as splitting up in short and long blocks!

The decision whether to use a long or a short block is done before
this by 4 parallel bandpass filters that detect energy surges.
Short blocks are used to get better precision in the time domain,
if needed.

In the graph, I'm not sure if the psychoacoustic model is
in parallel with the windowing+MDCT. Since the psymodel needs
frequency domain data I'd assume it works on the MDCT output
too, but I'm not sure.

>
This block generates the Spectral envelope and it is called as 
floor curve. [..] This spectral envelope 
curve is represented by LPC coefficients
>

The most important goal of the psychoacoustics is to deteremine
what is audible and what is not. That's totally missing here.
As I understand it, the psychoacoustics are used to simplify
the data to which the LPC curve is fitted. The LPC curve itself
is a coarse approximation of how the actual spectral envelope
looks after the psymodel has been applied.

>
These curve have formant like structure due to roll of property 
of the masking tone. 
>

I have no idea what this is supposed to mean...

>
The LPC coefficients are computed using Levinsion Durbin 
algorithm
>

The actual algorithm is pretty irrelevant in the grand scheme
of things, especially since (IIRC) it's changed at least once
and right now some kind of hybrid structure is used.

>
The output of the MDCT block and the LSP block are quantified 
and then encoded using the codebook mechanism. 
>

Erm, no. The 'floor curve' generated by the LPC/LSP coefficients
(the coarse approximation of the spectral envelope) is subtracted
from the MDCT data (the actual spectral envelope, after the psymodel
has been applied) leaving behind the 'residue'.

Both the floor curve coefficients and the residue are then fed
to the VQ codebooks. They are not 'quantified and then encoded'.
This is a single step inherent in the vector quantization.

>
The decoder receives the frame extracts the LPC coeff. 
>

You've just said earlier they are converted to LSP form
prior to encoding because that representation tolerates
quantization better. So it's not LPC coeffs that are 
decoded of course, but LSP coeffs. 


-- 
GCP

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.




More information about the Vorbis-dev mailing list