[vorbis-dev] questions re residuevqtrain

Fri Dec 22 02:30:01 PST 2000

xiphmont at xiph.org (Monty) writes:
> Yes, it runs either a straight LBG training, or a modified LBG training that
> attempts to maintain constant probability of occurrence per training cell
> (default).

Okay, so in the latter case the point of trying to maintain constant
probability is so that they get Huffman encoded (?) to the same length?  Is
that right?  This is what you mean when you talk below about "a codebook where
each entry has the same codeword length"?  Why is this good?  Intuitively, it
seems like if you're going to Huffman encode it anyway, it doesn't really
matter whether the codevector probabilities are equal or not.

> However, minimum average global error is a *lousy* training metric for audio
> (because frequency peaks are 'rare', you'll end up training to model the
> noise component of the signal, and peaks will always be very poorly
> approximated).

So maybe we could modify the metric to give more what you're looking for.  If
I understand, your saying that most training vectors look (say) like this

        (0.1, 0.2, 0.0, -0.1)

but occasionally you get something like this

        (-0.1, 0.1, 98.0, -0.1)

where the 98.0 is what you're calling a peak.  And you're saying that even
though these peaks are rare, distorting them is actually pretty bad compared
to lots of minor distortion of the short vectors (which you're calling noise,
I think).

If this is all about right, how about a metric that gives more emphasis to
peaks?  We could use squared or even cubed distance instead of just distance,
for example.  We could also try ignoring small (noise) distances.

> The problem is that in frequency domain audio data, we fortunately only have
> to carefully replicate features that make up a small part of the data.
> Unfortunately, residue trained codebooks are being trained to represent
> global characteristics with minimum error.  Globally, the tonal peaks, what
> we need to be most careful with, make up very little of the data and thus
> are modelled poorly.

If this is the key problem, I think a different metric (and possibly some
algorithm tweaks, a la bias) could fix things.  

One problem for me at this point is that I don't really understand the
characteristics of these residue vectors, the patterns that would be present
that would be candidates for compression.  (Is the residue-probability space
even very non-random/compressible?  Maybe it would be better to just compress
the residues directly with Huffman, gzip, or whatever?)

Idiot newbie question: How bad does it sound if you drop (zero out) the
residues completely?  Has anyone ever listened to it?

> Yeah, things were already bad at this point (all the dupes).  In this case,
> the data file is probably way too small to train (not enough short blocks to
> produce a set).

What does "produce a set" mean?  A set of what?

(The residue file had ~3700 training points, I think, to train 256 entries.)

Thanks for the help!
--Mike

-- 
[O]ne of the features of the Internet [...] is that small groups of people can
greatly disturb large organizations.  --Charles C. Mann

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.