[vorbis-dev] Transient coding: AAC vs. Vorbis
Sebastian Gesemann
sgeseman at upb.de
Wed Jun 2 10:45:53 PDT 2004
Thread-split from the vorbis-mailing list
("Vorbis determined to be as good as MPC at 128 kbps!")
<p>On Sun, 30 May 2004, Segher Boessenkool wrote:
[Steven So]
SS>> If iTunes AAC can encode castanets with much less pre-echo at
SS>> ABR 128 kbps, then hopefully there will be an imaginative
SS>> (and non-patented) way of doing this in Vorbis without the
SS>> bitrate inflation of GTune and QKTune.
[Segher Boessenkool]
SB> Use some different transform? MDCT isn't the best audio transform
SB> ever invented, esp. not for non-steady waveforms.
Steven is talking about Vorbis, Segher.
Vorbis makes use of the MDCT.
<p>Let's see... Vorbis I versus AAC in transient coding...
(simplified ASCII art following)
audio wave ('-'=low volume, <!>=transient )
--------------------------<!>--------------------
AAC
+---------------+---------------+---------------+
| 1 | 2 | 3 | frame no.
+---------------+-+-+-+-+-+-+-+-+---------------+
| L |S|S|S|S|S|S|S|S| L | transform
+---------------+-+-+-+-+-+-+-+-+---------------+
| A | B |C| D | E | scalefactor sets
+---------------+---------+-+---+---------------+
Vorbis I
+---------------+-+-+-+-+-+-+---------------+----
| 1 |2|3|4|5|6|7| 8 | packet no.
+---------------+-+-+-+-+-+-+---------------+----
| L |S|S|S|S|S|S| L | transform
+---------------+-+-+-+-+-+-+---------------+----
| F |G|H|I|J|K|L| M | floor curves
+---------------+-+-+-+-+-+-+---------------+----
Vorbis II (proposal, see below)
+---------------+---------+-+---------------+----
| 1 | 2 |3| 4 | packet no.
+---------------+-+-+-+-+-+-+---------------+----
| L |S|S|S|S|S|S| L | transform
+---------------+-+-+-+-+-+-+---------------+----
| N | O |P| Q | floor curves
+---------------+---------+-+---------------+----
L = long transform
S = short transform
A-E = sets of scalefactors (AAC)
F-N = floor curves (Vorbis I)
M-Q = floor curves (Vorbis II)
Obviously Vorbis I is wasting space in this example by
coding 5 floor curves (G-K) that are very similar.
AAC *shares* the scalefactor set B with these 5 windows
thus saving space.
Vorbis II could allow the storage of multiple 'short'
MDCT spectra (maximal blocksize1/blocksize0 many)
into one packet that share ONE floor curve.
It maybe also worth the effort to encode the channel's
residue vectors as one big vector (per channel) by
interleaving. I think this will also improve coding
efficiency a bit. As a side effect there will be the need
for moreresidue configurations since the size of the residue
vectors can be 1*128, 2*128, 3*128, ..., 7*128 and 8*128=1024.
<p>Back to Vorbis I:
What can be done to minimize pre-echos without increasing
bitrate that much ? How about temporal noise shaping ?
"Impossible!", you may say. Well, TNS is not a buil-in
Vorbis feature like in AAC. But it doesn't HAVE to.
TNS can be done either by coding the MDCT spectrum by
1) LPC-Filter + quantized LPC residual
OR
2) using an NSQ (noise shaping quantizer)
The AAC format allows method 1. But Method 2 could be done
for both (Vorbis and AAC) without breaking compatibility.
In fact, method 2 is used by MPC in the time domain to
shape the quantization noise within a subband to better
match the masking threshold.
An NSQ applied in time domain can spectral shape the
q-noise. What about an NSQ applied in the frequency domain ?
What does it do ? Well, because of the time/frequency duality
it will TEMPORAL-SHAPE the q-noise. Et voilà !
That's the theory. Don't know how well this can be applied
in practice for Vorbis. (has to be investigated)
<p>Ghis!
Sebastian
--
PGP-Key-ID (long): 572B1778A4CA0707
--- >8 ----
List archives: http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body. No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.
More information about the Vorbis-dev
mailing list