[Vorbis-dev] Ogg Vorbis questions

Sebastian Gesemann sgeseman at uni-paderborn.de
Thu Apr 20 07:12:29 PDT 2006


Hello, Hans Petter!


Hans Petter Selasky wrote:
> I'm currently working on a paper describing Ogg Vorbis. It is not finished 
> yet. Mostly the decoder is being described. If you have any comments, please 
> send them to me. See:
> 
> http://www.turbocat.net/~hselasky/math/vorbis/files/

Comments about this paper are included at the end of this EMail.

> Then I have a question about the function "bark_noise_hybridmp()" which is 
> used in the encoder. Can someone describe what the function does in detail? 
> From what I can see it performs various kinds of correlation, but I don't see 
> through it.

I can't answer that one. I'm not familar with the libVorbis code.
"hybrid" may refer to how the encoder's psychoacoustic model works. You
can find statements on the net (from Monty and/or ppl who are familar
with the source code) saying that the Vorbis encoder uses the FFT for
analysing tonal components and the output of the MDCT for noise
measurements and stuff.

> At last I want to point out that at low bit-rates, like 24kBit/sec, Ogg Vorbis 
> has trouble with "S" sounds. I thought it would be smarter to generate such 
> sounds using a white noise generator coupled with a FIR filter, than using 
> MDCT. Or maybe it is a problem in the encoder? What do you think?

You can do similar in the MDCT domain. Actually there's no big
difference except for efficiency. Inducing noise in the MDCT domain
easily allows shaping the noise (frequency-wise) whereas
frequency-shaping in the time domain has to be done via a convolution.
So, the MDCT version would be way more efficient. This is pretty much
what "PNS" is all about (Perceptual Noise Substitution). This tool is
part of the MPEG4 Audio specification. As simple as it is, it's patented.

Those artefacts you've experienced are hard to avoid within the current
stream specification. Nothing you can do without breaking stream
compatibility.

> Also see:
> http://www.turbocat.net/~hselasky/math/image_sound/

Yeah, this is fun. :)


-----8<----------8<----------8<-----


Comments about the paper of yours:

I'm missing "the big picture". What are you trying to tell us? After a
short introduction about bitrates of uncompressed audio streams you get
directly to the bit packing. I think you should at least mention some
basic psychoacoustical principals and how these are exploited by the
Vorbis design. Try to give the reader an overview of _what_ happens
_why_. A reader might ask himself why there's the MDCT involved for
example. You're saying "This document will describe how the codec Ogg
Vorbis achieves [compressing audio at low rates]". It looks more like a
loose collection of bits that relate to audio coding more or less
without noting how these interact and why.

I skimmed through the paper and noticed the following things:

page 15:
The PDF is more commonly known as "probability *density* function".
Anyhow, it's _not_ a "_tool_ to find out what sample values are used most".

page 16:
"the least significant bit is leftmost" makes no sense at this point.
It's just a collection of codes made of ones and zeros. The left most
bits are to be transmitted/read first, but there's no significance
assigned. You're mixing it up with how Vorbis does the encoding into
octets. That's a different story.

page 17:
I think you are using the term "noise" as replacement of what the
specification calls "residue" and/or "MDCT fine structure" in the
diagram. IMHO not a good idea. The rest of the diagram looks ok. Maybe
you could make just one box "iMDCT+overlap/add" instead of "iMDCT" and
"window function".

BTW: I think mentioning the MDCT formulars in a chapter dedicated to the
MDCT is a good idea, isn't it?

page 18:
Residue vectors are never multiplied with each other. They are not
transmitted via headers. The headers only contain side informations
about how residue vectors are coded.

page 19:
the set of floor curve points are defined by their X-values (something
you get from the headers) and their Y-values (which are coded in every
audio-packet).

page 27:
What's a "sine-wave filter"? FIR and IIR filters are not restricted to
work on sine waves.

page 28:
Correlation and convolution are similar. According to your formulat 'a'
is the impulse response. Then:
Discrete Correlation: g(t) = \sum_{n=-Inf}^{n=+Inf} f(t+n) a(n)
Discrete Convolution: g(t) = \sum_{n=-Inf}^{n=+Inf} f(t-n) a(n)
Note the sign inversion in f(...)
In case 'a' is symmetric it doesn't matter.

page 29:
"FIR filters have a linear phase response"
WRONG!

"The only disadvantage about FIR filters, is that they need some CPU
power and many coefficients, ..."
If you do the convolution in the time domain that's true. But you can do
better (Like already mentioned using the MDCT or FFT).

page 35:
"Using the FFT [...] is possible. But in my opinion it will sound better
if [the sound] is passed through a so-called FIR filter, ..."
Then, you must have done something wrong!

page 36:
Are you trying to compress images using Vorbis?????


-----8<----------8<----------8<-----


bye,
Sebastian


More information about the Vorbis-dev mailing list