[ogg-dev] OggPCM proposal feedback

Wed Nov 9 15:13:19 PST 2005

Hi all,

Siliva contacted me about this OggPCM proposal and asked me
to join in. For those who don't know me, I am the main author
and maintainer of libsndfile and therefore know quite a bit
about how uncompressed audio is stored in sound files. However
even I would not consider myself an expert; there are areas
to do with channel assignments that I know I am ignorant of.
I am also quite ignorant of the Ogg container format.

I have now read:

    http://wiki.xiph.org/OggPCM

and find that it has a number of short comings. 

  a) There is no marker to distinguish little endian data
     from big endian data.
  b) There is no mention of audio data being help in double
     precision (64 bit) floating point. Current this is
     supported in libsndfile by WAV, AIFF, AU, IRCAM and the
     two different Matlab/Octave file formats (I may also
     have overlooked some).
  c) I think having separate fields for things like signed/
     unsigned/float and bit width is a mistake. I would suggest
     instead a single field that encodes all this information
     in a enumeration. Ie:

         OGG_PCM_U8          /* Unsigned 8 bit */
         OGG_PCM_S8         /* Signed 8 bit. */
         OGG_PCM_S16
         OGG_PCM_S24
         OGG_PCM_S32
         OGG_PCM_FLOAT32
         OGG_PCM_FLOAT64

     and so on. This scheme makes it very difficult to get 
     signed/unsigned and bitwith messed up.
  d) Don't bother implementing unsigned PCM for bit widths
     greater than 8 bits. No other common file format uses 
     it and those unsigned formats are a pain to work with.
  e) Consider whether the endianness should also be encoded
     in the enumeration above. I would recommend that it is
     resulting in:

         OGG_PCM_U8          /* Unsigned 8 bit */
         OGG_PCM_S8         /* Signed 8 bit. */
         OGG_PCM_LE_S16
         OGG_PCM_BE_S16
         OGG_PCM_LE_S24
         OGG_PCM_BE_S24
         ...
         OGG_PCM_LE_FLOAT32
         OGG_PCM_BE_FLOAT32
         ...

  f) Encoding of channel information. In a two channel file,
     is the audio data a stereo image or two distinct mono
     channels? For a file with N (> 2) channels, are there 
     pairs of channels which should be considered as a stereo
     pairs or do you want to place these stereo pairs as 
     separate streams within a single ogg container? What
     about multi channel surround sound (there are a number
     of different formats like 5.1 and 7.1) or quadraphonic? 
     How are you going to specify which channel is which. 
     Being able to encode this stuff easily is **vital**.
  g) With things like surround sound, are you going to allow
     24 bit audio for the main stereo pair and 16 bits for
     the side channels? This might best be achieved using
     separate stream, but that would make channel information 
     all that more important. Is it useful to have PCM for the
     main stereo pair and say vorbis encoding for the side
     channels?

Please realize that this is all just off the top of my head.
There may be a bunch of other stuff I have overlooked. 

Is it OK if I can get some other people that know more about 
this stuff involved?

Erik
-- 
+-----------------------------------------------------------+
  Erik de Castro Lopo
+-----------------------------------------------------------+
"I'm not proud   .... We really haven't done everything we could
to protect our customers ... Our products just aren't engineered
for security." -- Brian Valentine, Senior Vice President of
Microsoft's Windows development team