[ogg-dev] OggPCM proposal feedback

Wed Nov 9 22:57:20 PST 2005

On Thu, Nov 10, 2005 at 10:13:19AM +1100, Erik de Castro Lopo wrote:
> 
>   a) There is no marker to distinguish little endian data
>      from big endian data.

The original reason for this is because Ogg makes such a matter moot, 
since the bitpacker in libogg2 handles endian.. however, if a "chunk" 
packer is made available (similar to memcpy), this becomes important 
since we'll want to copy the data in which ever endian it already is.

Does endian vary widely for raw audio codecs, or would it be reasonable 
to settle on one standard and expect all codecs to convert to the 
correct endian which don't comply with the "norm"?  If most hardware 
supports one endian or another, I say we should stick to that, since 
that's what the codec plugins would export anyway.

>   b) There is no mention of audio data being help in double
>      precision (64 bit) floating point. Current this is
>      supported in libsndfile by WAV, AIFF, AU, IRCAM and the
>      two different Matlab/Octave file formats (I may also
>      have overlooked some).

The bits per sample field covers this.  Set this to "64" and set the 
data type to "float" and it "should just work"...

>   c) I think having separate fields for things like signed/
>      unsigned/float and bit width is a mistake. I would suggest
>      instead a single field that encodes all this information
>      in a enumeration. Ie:
> 
>          OGG_PCM_U8          /* Unsigned 8 bit */
>          OGG_PCM_S8         /* Signed 8 bit. */
>          OGG_PCM_S16
>          OGG_PCM_S24
>          OGG_PCM_S32
>          OGG_PCM_FLOAT32
>          OGG_PCM_FLOAT64
> 
>      and so on. This scheme makes it very difficult to get 
>      signed/unsigned and bitwith messed up.
>   d) Don't bother implementing unsigned PCM for bit widths
>      greater than 8 bits. No other common file format uses 
>      it and those unsigned formats are a pain to work with.

Problem with this is inflexibility.  See, not ever application must 
support every possible combination of formatting - in fact, many will 
require a very small set of parameters going in, ie, "it must be float 
of 16, 24, 32, or 64 bit" or "it must be 16 or 24 bit signed".  

Implementors will never, very likely, implement 32-bit unsigned int, and 
that is not an issue.  If some fool does, his data will simply not be 
accessable to any other codec or application unless he writes a 
conversion plugin, which in essence, treats the two sides (from 
OggStream's perspective) as two entirely different codecs, even if both 
are in OggPCM format.

The flexibility of this does, though, encourage stuff like 96bit audio.  
Anyone implementing a codec which uses this, and import/exports it, will 
also write the appropriate conversion OggStream plugin which will allow 
applications which only support, say, 16bit audio, to work with it.

I guess you could chalk this up to an inherit difference in philosophy 
and purpose between OggPCM and RIFF/WAVE (.wav).. theirs is as much an 
interchange format as a storage codec, where OggPCM isn't really 
intended for storage.  FLAC (Free Lossless Audio Codec) limits to a 
certain number of formats, and all decoders can decode these formats, 
and it's well suited for storage as a /compressed/ lossless codec..

As primarily an interchange codec, if you have some rare or new format 
being imported/exported from your new codec, you had better also make 
sure it can itself support more common formats (ie, 44100/16/2) or that 
you include a conversion plugin which does that for your users.

>   f) Encoding of channel information. In a two channel file,
>      is the audio data a stereo image or two distinct mono
>      channels? For a file with N (> 2) channels, are there 
>      pairs of channels which should be considered as a stereo
>      pairs or do you want to place these stereo pairs as 
>      separate streams within a single ogg container? What
>      about multi channel surround sound (there are a number
>      of different formats like 5.1 and 7.1) or quadraphonic? 
>      How are you going to specify which channel is which. 
>      Being able to encode this stuff easily is **vital**.

I agree - this is something that wasn't on my radar until this morning 
when MikeS was asking about the channel layout in Vorbis/FLAC.  How 
would you suggest this data be included in the binary header?  I 
honestly have no experience with anything other than mono and stereo.

It should all be in the same stream.  

>   g) With things like surround sound, are you going to allow
>      24 bit audio for the main stereo pair and 16 bits for
>      the side channels? This might best be achieved using
>      separate stream, but that would make channel information 
>      all that more important. Is it useful to have PCM for the
>      main stereo pair and say vorbis encoding for the side
>      channels?

Do people really do such things as encode different channels with 
different sample sizes (and, I assume, samplerates)?

I'd really like to prefer keeping a fixed samplesize/samplerate for all 
channels.  I really doubt any Ogg audio codec is going to get that 
complicated anytime soon, and if it's really needed, a codec plugin 
/could/ be fed/provide packets from multiple OggPCM bitstreams, just 
like how a+v codecs (ie, DV) would import/export OggPCM+OggYUV.

Is there anything else you've thought of that we've missed?

-- 

The recognition of individual possibility,
 to allow each to be what she and he can be,
  rests inherently upon the availability of knowledge;
 The perpetuation of ignorance is the beginning of slavery.

from "Die Gedanken Sind Frei": Free Software and the Struggle for Free Thought
 by Eben Moglen, General council of the Free Software Foundation