[vorbis-dev] Transient coding: AAC vs. Vorbis

Sebastian Gesemann sgeseman
Tue Aug 3 06:24:10 PDT 2004


On Mon, 19 Jul 2004, Monty wrote:

> On Fri, Jun 11, 2004 at 07:51:04PM +0200, Sebastian Gesemann wrote:
>
> > As for the transform issue: Why do you think is the MDCT not
> > the best transform ever invented ? Do you want frequency varying
> > time/frequency resolutions ? Do you think it's worth it ?
>
> yes and yes.  rather, I've always wanted a hybrid transform pair, one
> that focuses on frequency/pitch and another that focuses on time.  The
> reason being that the ear hears and processes these seperately, and
> the MDCT is only well-suited to the former.

So, what options do we have ?

(1) the traditional hybrid filterbank approach:
a) roughly split the signal into subbands (DWT or (P)QMF)
b) further band-varying bandsplitting (MDCT)
(2) the "other" hybrid filterbank approach I've tinkered with
a) do the usual MDCT (like in Vorbis now)
b) increase the temporal resolution via MDCtransforming
some regions of the MDCT spectrum of the first transform
and kindof reverse a bit the first stage for those regions
(3) MDCT+TNS
a) just a single MDCT like already done in Vorbis I
b) linear predictive coding of the MDCT samples whereas the
LPC synthesis filter models the temporal shape

PROs:

(1) - frequency varying time resolutions are possible
(2) - frequency varying time resolutions are possible
- completely MDCT based and perfect reconstrucion possible
(3) - temporal shape can be accurately modeled by the LPC filter
- easy to implement

CONs:
(1) - You have to design QMF / DWT filters
- complicated to implement
- need for (spectral) alias reduction transforms after
the 2nd stage
(2) - a bit more time consuming to calculate
- need for (temporal) alias reduction
- alias reduction implies a higher encode/decode delay


I'd go for (3).
It's nearly as powerful as (1) and (2) and IMHO much more
efficient/simpler to implement.


And IMHO the reason why heavily quantized HF noise
sounds so metallic in case of a high spectral resolution
is simple scalar quantization without dithering.

I vote for a "Trellis Coded Quantizer" for Vorbis II. It
implicitely does some sort of dithering, does a graet
job in terms of rate/distortion at low rates and is VERY
EASY to decode.

http://ieeexplore.ieee.org/xpl/abs_free.jsp?arNumber=806615


[packets sharing floor curves and/or codebook class codes]

> They probably will in the future (meaning V-II) due to the large
> coding overhead at very low bitrates.

You really mean sharing across packets ? (which will create some
kind of Intra and Predictive-Coded packets)
Or just coupling several small (ie blocksize0) chunks
into one packet that share the curve ? (still independently
decodable packets)

I originally didn't want to go that far and create I-/P-packets
like it's already done in Video coding. But maybe it's worth a
try. I guess the floor curve overhead could be reduced by using
temporal and inter-channel prediction.


> Monty

Sebastian

--
PGP-Key-ID (long): 572B1778A4CA0707



More information about the Vorbis-dev mailing list