[vorbis-dev] Low bitrate high-band coding...

Jean-Marc Valin valj01 at gel.usherb.ca
Mon Dec 4 14:43:13 PST 2000



>> Now, I don't know what is the normal bit-rate allocated for this band, but I
>> expect it is greater than that. Am I right? (can anyone give me numbers for
>> this?)
> 
> Depends. It varies from zero to a few kilobits depending on what the
> psychoacoustics model says.

few kilobits, meaning? In my example can you say what amount of bits vorbis puts
in the 11-22 kHz band?

>> The technique I use to do this is inspired from an acticle I published recently
>> (http://panoramix.dyndns.org/jm/scw2000.pdf) and is based on the fact that at
>> these frequencies, the ear is totally insensitive to the spectral fine
>> structure.
> 
> Correct, however, the ear is extremely sensitive to preecho and
> time-localization of high frequency energy. You don't hear the pitch
> in the high frequencies, you hear the fact that a sharp edge was
> smeared (what aggressive quantization in the high end will cause).

The process I used is not subject to pre-echo. The way I extend the residue is
by simply upsampling the LP residue, causing spectral folding (unlike my
article, for which I use a non-linear function). The time-localization will thus
be preserved. For voice, I have even obtained very good results when starting
the extension at 3.5 kHz. 

>> I have tested it with some files (including harpsichord, which is supposed to be
>> hard to code) and the difference with the original (CD rip) is hard to hear. You
>> can find demo files of this at:
>> ftp://freespeech.sourceforge.net/pub/freespeech/
> 
> Harpsichord (like voice) is well suited to this technique because of
> regular harmoncs. Try it on violin, cymbals, and nonmusical sources.

I have added a violin file in the same directory
(ftp://freespeech.sourceforge.net/pub/freespeech/) with the "vi4-" prefix. I
think it works a bit better than the harpsichord. I don't files with cymbals,
but if you have some, please send them to me. As I said earlier, the ear is
totally insensitive to the spectral fine structure at these frequencies. It
cannot even tell noise from harmonics. The only reason I didn't just put noise
is that upsampling preserves the time localization within a frame.

> I hear a brief, glassy preecho ... what block size were you using for
> your experiment? I'm guessing very short.... The results might be
> more if not used in situations where ogg/lame would be using short
> blocks and used over lapped 2048 sample blocks like ogg.

I'm using 1024-sample frames and my LPC filter is calculated on a 2048 window.
Anyway, the whole point of this was for very-low bitrate modes where you cannot
afford many bits for the high-band and in which case, you could still afford 500
bps. I think I could go as low as that using vector quantization and prediction.

Right now, the system is not optimal, I still need to play with the window size,
and the LPC regularization params (noise floor, pre-emphasis, bandwidth
expansion/lag windowing). What I'd like to know is whether you think this could
potentially be interesting.

        Jean-Marc

P.S. Please also reply directly to me, as my subscription to vorbis-dev doesn't
seem to work.


-- 
Jean-Marc Valin
Universite de Sherbrooke - Genie Electrique
valj01 at gel.usherb.ca

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.




More information about the Vorbis-dev mailing list