[Vorbis-dev] Re: Vorbis bistream definition / separation from ogg

Thu Apr 28 06:12:07 PDT 2005

Michael Smith wrote:

 > There are many disadvantages to using only a single block size (number
 > of samples per frame). You are incorrect in thinking that things like
 > mp3, aac, etc. have constant frame sizes - like vorbis, they use two
 > frame sizes (at least mp3 does, the others definitely use more than
 > one, but I'm not certain that it's two).

Let me rephrase what I think you were saying:
blocksize = framesize = samples per frame (or 'packet' in Ogg/Vorbis 
terminology, right ?)

Now, according to this definition MP3 and AAC *DO* have a constant
framesize.
MPEG 1     Layer 3 -> Always 1152 samples/frame
MPEG 2/2.5 Layer 3 -> Always  576 samples/frame
AAC                -> Always 1024 samples/frame

What you probably meant was the 'transform block size'. MP3 and AAC
make (just like Vorbis) use of two 'transform block sizes'. It's just 
that several small transform blocks are jointly stored like a big 
transform block into one frame/packet so the overall amout of samples 
per frame/packet will be constant.

MPEG 1     Layer 3 -> 2 x 576 spectral lines or 6 x 192 spectral lines
MPEG 2/2.5 Layer 3 -> 1 x 576 spectral lines or 3 x 192 spectral lines
AAC                -> 1 x 1024 spectral lines or 8 x 128 spectral lines

Here lies the difference between MP3/AAC and Vorbis because Vorbis 
always only stores one transform block into one packet. So a Vorbis 
packet may contain either 0.5*blocksize0 samples or 0.5*blocksize1 
samples (per channel). blocksize0/1 referts to what's encoded into the 
decoder configuration setup header (the 'nominal' MDCT window lengths).

Unfortunately I have to say that the AAC approach has an advantage here:
Several small transform blocks can share their scalefactors and codebook 
selection side infos (floor curve / residue codebook side info 
equivalent) which reduces overhead whereas Vorbis has to code the floor 
curves / codebook side infos for each 'short' packet seperately (even if 
they are quite similar).

So your "There are many disadvanteges to using only a single blocksize" 
doesn't hold unless you switch your definition of 'blocksize' again.

regards,
Sebastian