[ogg-dev] OggPCM proposal feedback

Erik de Castro Lopo mle+xiph at mega-nerd.com
Thu Nov 10 00:03:43 PST 2005

Arc wrote:

> Does endian vary widely for raw audio codecs,

Well there are really only two endian-nesses, big and little.

WAV is usually little endian but there is also a (very rare) big endian 
version. AIFF is usually little endian but also supports big endian 
encoding. CAF, AU, IRCAM and a number of others support both endian-nesses

> or would it be reasonable 
> to settle on one standard and expect all codecs to convert to the 
> correct endian which don't comply with the "norm"?

Not reasonable.

> The bits per sample field covers this.  Set this to "64" and set the 
> data type to "float" and it "should just work"...

See my comment on the wiki:


Most importantly:

    "Please don't make determination of the data format depend on 
     multiple fields. Instead use an enumeration so that something 
     like little endian 16 bit PCM can be specifed as OGG_PCM_LE_PCM_16 
     and big endian 64 bit doubles can be specified as OGG_PCM_BE_FLOAT_64. 
     This scheme is far more transparent and self documenting. If the 
     format field is 8 bits, this scheme supports 256 formats; if its 16 
     bit it will support 65536 formats.

> >   c) I think having separate fields for things like signed/
> >      unsigned/float and bit width is a mistake. I would suggest
> >      instead a single field that encodes all this information
> >      in a enumeration. Ie:
> > 
> >          OGG_PCM_U8          /* Unsigned 8 bit */
> >          OGG_PCM_S8         /* Signed 8 bit. */
> >          OGG_PCM_S16
> >          OGG_PCM_S24
> >          OGG_PCM_S32
> >          OGG_PCM_FLOAT32
> >          OGG_PCM_FLOAT64
> > 
> >      and so on. This scheme makes it very difficult to get 
> >      signed/unsigned and bitwith messed up.

You didn't address this issue. Do you think it is unimportant?

> >   d) Don't bother implementing unsigned PCM for bit widths
> >      greater than 8 bits. No other common file format uses 
> >      it and those unsigned formats are a pain to work with.
> Problem with this is inflexibility.  See, not ever application must 
> support every possible combination of formatting -

Exactly, a codec could support OGG_PCM_S16, OGG_PCM_FLOAT32 and thats
it. If the decoder in the codec wants to figure out if it supports the 
current file it can do:

    if (format != OGG_PCM_S16 && format != OGG_PCM_FLOAT32)
       ooops_we_dont_handle_this ("some error message");

This is far less error prone than:

    if (! (bitwdith == 16 && signed && data_format == OGG_PCM_PCM)
           || ! (bitwdith == 32 && data_format == OGG_PCM_FLOAT))
       ooops_we_dont_handle_this ("some error message");

> in fact, many will require a very small set of parameters going in,

My propsal has a small number of parameters; one. I don't thinks its
practical to have zero parameters. How this:

    switch (format)
    {   case OGG_PCM_S8 :
        case OGG_PCM_FLOAT32 :
        case OGG_PCM_FLOAT64 :
                /* ALl Ok. */
                break ;
                ooops_we_dont_handle_this ("some error message");
                break ;

Its hard to get this wrong and its obvious when it is wrong.

> ie, "it must be float 
> of 16, 24, 32, or 64 bit"

There is no such thing as 16 and 24 bit float. 

> Implementors will never, very likely, implement 32-bit unsigned int, 

My point exactly. So why even make it possible? If that changes at some
point in the future add the enumeration.

> >   f) Encoding of channel information. In a two channel file,
> >      is the audio data a stereo image or two distinct mono
> >      channels? For a file with N (> 2) channels, are there 
> >      pairs of channels which should be considered as a stereo
> >      pairs or do you want to place these stereo pairs as 
> >      separate streams within a single ogg container? What
> >      about multi channel surround sound (there are a number
> >      of different formats like 5.1 and 7.1) or quadraphonic? 
> >      How are you going to specify which channel is which. 
> >      Being able to encode this stuff easily is **vital**.
> I agree - this is something that wasn't on my radar until this morning 
> when MikeS was asking about the channel layout in Vorbis/FLAC.  How 
> would you suggest this data be included in the binary header?  I 
> honestly have no experience with anything other than mono and stereo.

I have little more experience than you. I sent invitations for people
to join this discussion to the music-dsp mailing list. I hope somebody
knowledgeable will show up.

> >   g) With things like surround sound, are you going to allow
> >      24 bit audio for the main stereo pair and 16 bits for
> >      the side channels? This might best be achieved using
> >      separate stream, but that would make channel information 
> >      all that more important. Is it useful to have PCM for the
> >      main stereo pair and say vorbis encoding for the side
> >      channels?
> Do people really do such things as encode different channels with 
> different sample sizes (and, I assume, samplerates)?

Different bitwidth makes sense. You need to high dynamic range
on your main stereo signal, but probably not on the side channels.

Different sample rates also makes sense. If the main stereo pair
is sampled at 96kHz it makes sense to have the sub bass signal
(ie all the low frequencies) sampled at a much lower rate. For a
sub-bass signal 8kHz might be appropriate.

> I'd really like to prefer keeping a fixed samplesize/samplerate for all 
> channels.  I really doubt any Ogg audio codec is going to get that 
> complicated anytime soon,

Really? What about a high quality Ogg video stream multiplexed with 
a 5.1 audio stream?

> Is there anything else you've thought of that we've missed?

Not yet, but we haven't heard from anyone else yet. I would like
to see input (or at least an OK) from a large number of people in
the audio field.

  Erik de Castro Lopo
'Unix beats Windows' - says Microsoft! 

More information about the ogg-dev mailing list