[vorbis-dev] long message on absolute threshold of hearing (ATH)

Mark Taylor mt at sulaco.org
Mon Sep 6 20:11:10 PDT 1999



Robert recent coded up (in LAME) the formula for the ATH.  I would
like to switch the LAME psycho-acoustics (gpsycho) over to using
formulas for all quantities like this instead of the ISO MP3 tables.
This will make it much easier to use gpsycho with any sampling rate,
any size FFT or any number of critical bands, so it can be used in
other encoders (like Vorbis).  Also, a couple people have suggested at
higher bitrates there should be less noise shaping.  At 128kbs, you
need all the help you can get from the psycho-acoustics, but at 256kbs
probably just using the ATH would be quite good.

Anyway, I was comparing Robert's values to what the ISO uses, and
I was not able to get them to match up.  I think that the ISO formula
is basically broken.  Here's my take on the situation - if anyone
knows why I'm wrong or has other usefull comments, pleast post!

The ISO formula goes through a complicated procedure of first
computing a threshold in partition bands, adding the ATH, then suming
the values into scalefactor bands, and finally computing a ratio
(masking/energy).  Then, in loop.c, when computing the allowed
distortion, this ratio is multiplied by the average energy (as computed by the
MDCT) within each scalefactor band.

To test this, I first measured the strength of a 3.3kHz sine
wave with amplitude 32767 (as large as possible on a 16bit CD).  
This is the frequency for which the ear is most sensitive.
The energy of this wave shows up in scalefactor band (sfb) = 12,
with an energy of -10db.  The dynamic range of a CD is 96db,
meaning that the energy range in sfb=12 is:  -106 -> -10 (db).  

Next, I disabled all the masking from l3psy(), except the ATH,
then computed the actual l3_xmin (allowed distortion in loop.c).
Using the ISO ATH formula, this number hovers around -150db,
a full 50db below the lowest possible energy!!  Thus it is *never*
used.

Here are the results from a random frame.  'ISO masking' is the ISO
ATH value (since all other masking was turned off) 'ath' is the value
computed from Robert's code (normalized at 3.3kHz), 
and ave_ener is the average amount of energy in the scalefactor band:

 0 ISO masking=    -111.86  ath=    -75.46    ave_ener=    -28.77  (db) 
 1 ISO masking=    -133.42  ath=    -85.59    ave_ener=    -35.10  (db) 
 2 ISO masking=    -152.47  ath=    -88.77    ave_ener=    -35.88  (db) 
 3 ISO masking=    -149.16  ath=    -90.40    ave_ener=    -43.64  (db) 
 4 ISO masking=    -150.66  ath=    -91.43    ave_ener=    -54.46  (db) 
 5 ISO masking=    -144.71  ath=    -92.16    ave_ener=    -61.28  (db) 
 6 ISO masking=    -149.53  ath=    -93.02    ave_ener=    -56.81  (db) 
 7 ISO masking=    -153.00  ath=    -93.76    ave_ener=    -54.95  (db) 
 8 ISO masking=    -143.65  ath=    -94.81    ave_ener=    -47.43  (db) 
 9 ISO masking=    -157.70  ath=    -96.04    ave_ener=    -40.64  (db) 
10 ISO masking=    -150.85  ath=    -97.84    ave_ener=    -50.10  (db) 
11 ISO masking=    -140.94  ath=    -99.92    ave_ener=    -55.77  (db) 
12 ISO masking=    -151.46  ath=   -100.98    ave_ener=    -61.99  (db) 
13 ISO masking=    -165.33  ath=   -100.92    ave_ener=    -62.04  (db) 
14 ISO masking=    -151.18  ath=    -98.48    ave_ener=    -49.57  (db) 
15 ISO masking=    -149.10  ath=    -95.20    ave_ener=    -48.68  (db) 
16 ISO masking=    -151.12  ath=    -93.72    ave_ener=    -58.48  (db) 
17 ISO masking=    -151.89  ath=    -92.10    ave_ener=    -59.46  (db) 
18 ISO masking=    -141.80  ath=    -88.49    ave_ener=    -53.07  (db) 
19 ISO masking=    -133.54  ath=    -80.69    ave_ener=    -60.38  (db) 
20 ISO masking=    -128.13  ath=    -66.16    ave_ener=    -74.77  (db) 

I haven't yet run any listening tests with the new ath, but hopefully
tomorrow.  My feeling as that unlike the other psycho acoustics, ATH
should be close to perfect.  That is, quantization noise < ATH really
should not be audible.  I am hoping the ATH will make a nice analog silence
detection: any time the energy < ath, we can just zero out all the
coefficients.  

Mark

  

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/



More information about the Vorbis-dev mailing list