[vorbis] Beta3 impressions

Wed Nov 29 15:18:40 PST 2000

brian wrote:
> 
> On Mon, 20 Nov 2000, Segher Boessenkool wrote:
> 
> > > > In his interview to Slashdot, Monty said that wavelets will be used in
> > > > Vorbis (after version 1.0). I guess that wavelets will be able to eradicate
> > > > pre-echo more thoroughly ?
> > >
> > > Yes, wavelets are my attempt at solving the problem correctly.
> >
> > Do you already have an algorithm to decide which part to encode with
> > mdct, and which part with wavelets? I'm currently working on something
> > like this.
> 
> Perhaps it's a little early to search for algorithms until we really know
> what methods will be employed in the encoding process.

Not really. I mean: an algorithm that splits the audio data in two parts
(not separate frames: more like input = mdct stuff + wavelet stuff), where
the one half is more 'frequency-like' and the other half is more 'time-like'.
This allows us to encode the 'frequency-like' (steady sines) half with large
mdct's, while the 'time-like' (i.e., short spikes in the time domain) signal
isn't polluting the frequency-domain signal with broadband noise.

If you _do_ encode it with separate frames, you'll have a continuity problem;
it's difficult to have two different encoding methods give the same
"color" or
"room temperature" to the sound (anyone know a better word or description?)

So I think the mdct and wavelet stuff should both be conmplete streams (one
can be zero, of course), and just added to each other to give the final
reproduction pcm stream. So we'll need an algorithm to split a pcm stream
into these two halves as efficiently as possible (where "efficient" means:
getting the biggest coding gain/audio quality).

> I am curious as to what wavelet methods are going to be used for the
> pre-echo control.  I've recently obtained an advisor to do a thesis in

"pre-echo control" is really a horrible word; it's not the pre-echo it's
controlling, it's hiding an artifact _in_ the pre-echo phenomenon.

> wavelet audio coding.  I was thinking of two approaches, primarily because
> of mixed results in other people's work in the subject:
> 
>    (1) Perform the traditional wavelet analysis methods, look for good
>        bases in different situations, etc.
> 
>    (2) Instead of transforming time-signal data, look at the frequency
>        information spit out of the fft or preferable method.  Perform some
>        sort of wavelet packet analysis and reconstruct an approximation of
>        the frequency signal using the psychoacoustic model to direct
>        precision.  Granted, this would simply be quantization via
>        wavelet packets.  However, it would be interesting to see what
>        happens when performed.  (It would obviously not solve the
>        problem of pre-echo control, but may be an interesting thing
>        to investigate altogether).

The mdct spectrum is usually very much less smooth than the time signal
itself. Will wavelets help? It's worth a try...

> I was curious to the methods that Monty had in mind (as far as choosing
> bases, etc).  Comments?

I think he hasn't got any particular method in mind yet; all the more usual
methods are patented, and some experimentation will be needed to find a
method that's well-suited to Vorbis. But he can answer your question best,
of course :-)

Dagdag,

Segher
--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.