[vorbis-dev] Transient coding: AAC vs. Vorbis

Fri Jun 11 10:51:04 PDT 2004

On Fri, 11 Jun 2004, Segher Boessenkool wrote:

> Improved audio quality by experimenting?  Hey, if everyone just
> sits back and waits for Vorbis II, it'll never be there ;-)

Come on... It's not my intention to just wait. I have
ideas and I'm willing to share'em (this is what I'm doing
right now) ;)

As for the transform issue: Why do you think is the MDCT not
the best transform ever invented ? Do you want frequency varying
time/frequency resolutions ? Do you think it's worth it ?

I've experimented with a hybrid filterbank that allows frequency
varying time resolutions and would fit into the Vorbis world.
But it also introduces a higher encode/decode delay and a higher
complexity. With these catches it's not suited for the goals of
Vorbis II, I guess. :(

If you have better ideas, share'em. I currently believe
any effort towards replacing/adding a transform is more like
wasted time.
First, one has to identify the shortcomings of the MDCT.
I once thought a frequency varying time/frequeny resolution
is a cool filterbank feature. But... I actually don't know
if it's any good.

> Except that we're not fixed-size-per-block -- 94 bits per block
> on average is exactly 1 bit per block bigger than 93 bits per
> block average, even when padding every packet to a multiple of
> 8 bits.

Yeah, yeah. Just wanted to remind you that this one bit is rather
small compared to the bits that are 'wasted' anyway at the end
of packets. 93 bits per packet is indeed not much and 94 would
be 1% more in this case. But IMHO the proposal saves much more
space than it wastes by this one bit. We're on the same side.

> > second, the packets are still independently decodable.
>
> Not if they share the floor curve etc.!

They don't!
Packets don't share floor curves in this proposal.
multiple short windows are stored IN ONE packet
that share a floor curve and or residue codebook class codes.
There won't be any dependence across the packets.
not more than before.

It's just a generalisation :
one packet - one floor - one set of residue codebook class codes.
but with either 128 samples per channel, 256 samples per channel,
... 896 s/c or 1024 s/c. (the latter in two variants: one long
transform or 8 short transforms)
One packet still represents a locally static part of the signal.

(in case of blocksize0 being 256 and blocksize1 being 2048)

This way, many bits can be saved right before a transient attack.
And one bit will be 'lost', true.

Ghis!
Sebastian

--
PGP-Key-ID (long): 572B1778A4CA0707