[vorbis-dev] questions re residuevqtrain

Thu Dec 21 12:02:28 PST 2000

> I'm trying to understand the residuevqtrain program, and I have some questions
> for Monty, Erik, or anyone that understands how it's supposed to work.
> 
> I captured TRAIN_RES data from an encoding of a single track (about 4:43),
> producing two files, residue_0.vqd (3727 lines, = 3727 points?) and
> residue_1.vqd (huge). 

residue_0.vqd is residue data from short blocks, residue_1.vqd from
long blocks.

> I then did a run with the parameters from the usage
> message
> 
>    residuevqtrain test_256_6_8_01_0 -p 256,6,8 -e .01 residue_0.vqd
> 
> (with the version of residuevqtrain at the CVS head, last changed around 11/9
> or so).
> 
> 1.  I'm thinking that this program is basically supposed to solve the VQ
>     design problem, as documented on 'http://data-compression.com/vq.html'
>     (with some variations for vorbis).

Yes, it runs either a straight LBG training, or a modified LBG
training that attempts to maintain constant probability of occurrence
per training cell (default).  

>     In that problem, the goal is to choose the codevectors to minimize average
>     distortion (average distance between each training vector and its
>     associated codevector).  This measure, or something similar, is given in
>     the residuevqtrain output as 'metric error'.  In the run I did, though,
>     the value of metric error actually *increases* over the 1000 passes (see
>     output below).  Isn't this a bad thing, or am I missing something?

If you use -b, it's a straight LBG and global error will always go down.

However, minimum average global error is a *lousy* training metric for
audio (because frequency peaks are 'rare', you'll end up training to
model the noise component of the signal, and peaks will always be very
poorly approximated).

> 
> 2.  The residuevqtrain algorithm actually seems to be trying to minimize a
>     slightly different measure, marked as 'dist' in the output.  If I
>     understand the idea correctly, the idea is to choose codevectors so that a
>     nearly equal number of training vectors will be associated with each one.

yes.

>     The 'dist' measure is a measure of how much this is so, smaller values
>     being better.  Why would this be better than just minimizing distortion?

it isn't really.... what it gives you is a codebook where each entry
has the same codeword length.  the training stuff is not just
production, it's meant for experimentation as well.

>     Although I can imagine the two metrics being highly correlated, 

They are.

>     it looks
>     to me like the former would be better when they differ (think of what
>     happens if you have a group of training vectors clumped together).

Both, actually, are very suboptimal it turns out; perhaps there's a
better way to do it that I haven't tried (well, there almost certainly is).

The problem is that in frequency domain audio data, we fortunately
only have to carefully replicate features that make up a small part of
the data.  Unfortunately, residue trained codebooks are being trained
to represent global characteristics with minimum error.  Globally, the
tonal peaks, what we need to be most careful with, make up very little
of the data and thus are modelled poorly.

> 3.  The program does seem to reduce 'dist', but I notice that the lowest value
>     seen for it and for 'metric error' in this run was actually at pass 0.
>     Does this mean that we should just stop at pass 0, or that the metrics are
>     wrong, or is something else going on here?

omething else happened.  'dist' does not converge stably (it tends to
oscillate about the minimum), but it should not shoot off to infinity.

> 
> 4.  If I'm reading the 'quantized entries' from the output .vqi file
>     correctly, it looks as if there are a large number of duplicate entries
>     (maybe because of quantization?).  Isn't this bad?  

Yes, is bad.

>     Or am I misreading?
>     (In my example, I'm reading the first six lines as the first codevector,
>     and so on.)

Yeah.  BTW, grouping sixes is likely bad (the vector size isn't a multiple of six. try fours.

> $ residuevqtrain test_256_6_8_01_0 -p 256,6,8 -e .01 residue_0.vqd 
> 128 colums per line in file residue_0.vqd
> reseeding with quantization....
> Pass #0... : dist 0.361175(305.73) metric error=1.73526 
> cells shifted this iteration: 4
> cell diameter: 4.66::10.3::36.8 (0 unused/79 dup)

Yeah, things were already bad at this point (all the dupes).  In this case, the data file is probably way too small to train (not enough short blocks to produce a set).

Monty

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.