[vorbis-dev] A new introduction attempt.

Lourens Veen lourens at rainbowdesert.net
Wed Sep 10 07:39:00 PDT 2003



On Wed 10 September 2003 13:08, Richard Felton wrote:
> I have been using libvorbis for the past few weeks and have been
> asked to summarise what I have discovered about the codec. There
> is an early draft of the document at
> http://www.geocities.com/gatewaystation/vorbis/vorbis.htm  -

Firstly, it may be a good idea to make it clear that what you are 
documenting is the Xiph.org Vorbis reference codec. I could in 
theory write an encoder that outputs valid Vorbis data yet works in 
a wholly different way.

Secondly, in the diagram the MDCT and FFT appear before the 
psychoacoustic stage, while in the text they are part of it. I 
think the diagram is right, because transforming data into the 
frequency domain doesn't have much to do with human hearing, 
instead it is done because it yields data that can be more 
effectively compressed by vector quantisation. So the 
psychoacoustics header should be two paragraphs down and the text 
should be adapted accordingly.

In the vector quantising explanation, I would change the middle 
three paragraphs to something like the following:


---
Each point falls into a section and we could transmit the relevant 
section number for each point. Since we are sending only a one 
digit number rather than the entire vector, we achieve compression. 
The decoder will have a codebook, which holds a vector for each 
section, and use it to look up a vector for each section number it 
receives. Ofcourse, since all original vectors within a section are 
eventually decoded to the same vector from the codebook, some 
information is lost.

The design of a vector quantiser is a difficult task. Obviously we 
want to lose as little information as possible, so the decision 
boundaries and codebook vectors must be designed in such a way as 
to minimise the difference between the original vector and the 
decoded vector. This in turn depends upon the distribution of the 
input vectors, i.e. the input data of the encoder, and it is 
important that the codebook used works well with a wide variety of 
input data.

Vorbis extends the theory into more dimensions but this is difficult 
to convey graphically. An algorithm for codebook design (similar to 
the one used in Vorbis) can be found on the web at 
data-compression.com [5].

The encoder achieves further compression by encoding the indices 
using Huffman codes before sending them to the decoder.
---

As for the German article, the (German) online summary only mentions 
that there were 6000 entries, of which 3300 with the 64 
kbit-compressed data. Ogg Vorbis is clearly the best at 64kbit, 
while at 128kbit the differences are smaller, with most people 
being unable to distinguish between RealAudio, WMA, MP3Pro and MP3.

Lastly, perhaps it would be possible to generate a call graph of the 
encoder somehow? It would be nice to have a graphical 
representation of what uses what. Or maybe a more clear link 
between the source files and the blocks in the block diagram, so 
that it's easy to see which part of the functionality is 
implemented where.

Cheers,

Lourens
-- 
GPG public key: http://home.student.utwente.nl/l.e.veen/lourens.key

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.




More information about the Vorbis-dev mailing list