[ogg-dev] OggPCM proposal feedback

Thu Nov 10 13:35:47 PST 2005

On Thu, Nov 10, 2005 at 07:03:43PM +1100, Erik de Castro Lopo wrote:
> 
> WAV is usually little endian but there is also a (very rare) big endian 
> version. AIFF is usually little endian but also supports big endian 
> encoding. CAF, AU, IRCAM and a number of others support both endian-nesses
> equally.

This doesn't seem to be a large issue - a single bit in the header could 
specify it, 0=MSB, 1=LSB, or vice versa.

VorbisFile will export either endianness, this seems to be the end of 
this part of the debate.

>     "Please don't make determination of the data format depend on 
>      multiple fields. Instead use an enumeration so that something 
>      like little endian 16 bit PCM can be specifed as OGG_PCM_LE_PCM_16 
>      and big endian 64 bit doubles can be specified as OGG_PCM_BE_FLOAT_64. 
>      This scheme is far more transparent and self documenting. If the 
>      format field is 8 bits, this scheme supports 256 formats; if its 16 
>      bit it will support 65536 formats.

You're still working with the philosophy of FourCC-world, where based on 
wether a plugin or application supports a 32-bit identifier you know if 
it either has full support or no support.

We aren't working by that philosophy.  We do not need to maintain an 
table of predefined formats, extended each time someone wants to use a 
new format, since no application needs to support any combination of 
encoding parameters.

Honestly, as far as I'm concerned unsigned samples can go away... almost 
nothing uses 8-bit samples anymore, and unsigned 8-bit even less so.

However, support for (ie) 48-bit-float should not have to be created, 
the values for how many bits to use and wether it's int or float should 
be seperate, as should the number of channels/etc.

On Thu, Nov 10, 2005 at 03:44:53PM +0800, illiminable wrote:
>
> I think this is the wrong approach, flac and other codecs operate on a
> tighter subset, because they have to perform complex transformations on the
> data, and supporting too many types increase complexity. A raw format
> essentially needs no processing, it just needs copying into a buffer that
> supports that type of data.

The complexity isn't increased by added flexibility, and that 
flexibility completely eliminates the same issue created from FLAC - 
FLAC was designed to losslessly support every common audio format, and 
yet, you find it's subset of formats too tight.

Don't you see the inherit issue here?  It comes back to someone deciding 
which formats should be valid, and which ones wont, and enforcing that 
by using an index# to a table of supported formats vs leaving it 
freeform for future implementors to use.

Changing the spec a bit, where the samplesize must be a multiple of 8 
and may not exceed 128bit (4-bit field), seems like something worthwhile 
to eliminate the padding issue.

But between float and int, why /not/ allow someone to do something 
insane like 96-bit audio?  20 years ago, we thought that 16 bit, or 
prehaps 24 bit, was the maximum we could do.  Why would anyone want more 
than 24 bit?  And yet, the issue was raised that 64-bit audio samples 
are nessesary.  In another 20 years, will people be arguing that 128bit 
samples are nessesary?  Or than 48bit is a good tradeoff between 32bit 
and 64bit?  

No - it does not increase complexity, nor does it impose any 
requirements on implementations, since instead of a 32-bit identifier we 
use the entire first packet of the stream to check for compatability.  

No, your media player does -NOT- have to support 256 channel audio, nor 
must it support audisonic, or 64-bit audio, etc.  There's no reason, 
however, to force everything into artificial, arbitrary limitations 
based on what we believe is reasonable for today.

If a media player only supports a subset of what the codec supports, 
that's completely fine and expected.

> I have little more experience than you. I sent invitations for people
> to join this discussion to the music-dsp mailing list. I hope somebody
> knowledgeable will show up.

There's a difference between experience and differences of design 
philosophy.  This isn't the issue of right or wrong, but two different 
styles of designing codecs.

Raw fourcc codecs are each setup for a different format, or small set of 
formats.  RIFF/WAVE uses a subset of formats, expecting all applications 
which support it's FourCC to understand all those formats.  Again, this 
is done under the concept that a codec should either be fully supported 
or unsupported.

Whereas, not all audio codecs are going to support even the subset that 
you provided (64-bit float, for example).  Nor are all applications 
which use Ogg going to support anything but 16-bit signed int, nor 
should they be expected to.

I think it's reasonable to do away with unsigned because modern codecs 
just aren't going to use it, but I'm not going to try to predict wether 
someone will want to use 48-bit audio, or 128-bit audio, and wether 
they'll use int or float.  

> Different bitwidth makes sense. You need to high dynamic range
> on your main stereo signal, but probably not on the side channels.
> 
> Different sample rates also makes sense. If the main stereo pair
> is sampled at 96kHz it makes sense to have the sub bass signal
> (ie all the low frequencies) sampled at a much lower rate. For a
> sub-bass signal 8kHz might be appropriate.

I think, for these, given Ogg's use of granulepos and the syncing 
complexity which allowing different channels to be different rates and 
sizes, this is something best left to muxed raw channels and have any 
codec which supports this draw from the different raw channels. 

> Not yet, but we haven't heard from anyone else yet. I would like
> to see input (or at least an OK) from a large number of people in
> the audio field.

I think this is good to emphasis - it's ok to support some combinations 
of formats which are not used, since they'll simply be ignored if 
they're infavorable to implement, but missing something nessesary is 
something we need to make sure not to do.

I've put a reduced config set on the wiki.

-- 

The recognition of individual possibility,
 to allow each to be what she and he can be,
  rests inherently upon the availability of knowledge;
 The perpetuation of ignorance is the beginning of slavery.

from "Die Gedanken Sind Frei": Free Software and the Struggle for Free Thought
 by Eben Moglen, General council of the Free Software Foundation