[vorbis-dev] Impulses

Monty xiphmont at xiph.org
Fri Nov 19 07:22:01 PST 1999



> After playing with the vorbis code for a while and doing tons of hacks and
> analysis on it, I've found it to perform very poorly with impulse signals.

At the immediate moment, a commit error on my part is making it worse :-)  I
currently have the psychoacoustics to *only* use a 2048 sample window... I was
doing distortion testing on that specific window size and forgot to put it back
to normal (where it will use a 256 or 512 sample window for impulses).

(I just turned it back on in CVS).

*HOWEVER*, that just puts us back in the mp3 range of dealing.  I've been
interested for a long time in doing something similar to what you propose, but
I'd backed out what I was doing in order to push ahead with more fundamental
parts of the first cut release (too many details for just me to handle at the
moment :-) There are actually bitflags in the stream right now waiting for
exactly this sort of thing to drop in.

(This also means that 'research' on the subject can proceed at a prudent pace 
without holding anything up).

> I've been looking at various ways of taking care of this, and before I
> bother implimenting something I'd like to make sure that no one has gone
> down this path before:

I've gone part way down the path, so I have some additional clues to offer.
This basic tack has my approval.

> Roughtly vorbis currently does:
> 
> input wave -> MDCT -> LPC -> LSP -> quant -> ------------------>output
> 				         \->delpc->error->quant -^
> 
> What do you think of this:
> 
> input wav -> DWT -> sum non-impuse factors -> iDWT -> MDCT ... (like above)
>               \
>                -> -> sum impulse factors -> iDWT -> LPC -> LSP -> quant 
> 
> i.e. use a wavelet transform to seperate out impulsey signals and
> compress them in the time domain.

Yes, this is exactly the way I wanted to proceed (only I wasn't using wavelets;
wavelets are indeed worth pursuing).  The encoder/decoder were structured
exactly for the above flow (the convenience of the layout isn't accidental).
However, we need to find a better way to emcode the impulses.  (More on this
later; I wanted to respond quickly to say that you're on the path that I
started, but hadn't continued).

> The decoder complexity really isn't increased much (just one more
> dequant/LPC and a sum). I think there are optimized versions of the haar
> DWT that go really fast too..

Yes, in addition, wavelets are (IIRC) linear time (O(n)) transforms as well.
The time taken by Haar transform itself would practically be lost in the noise
:-)

I'll have much more to talk about after some sleep.

Monty

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/



More information about the Vorbis-dev mailing list