[ogg-dev] OggPCM proposal feedback
arc at Xiph.org
Wed Nov 9 22:57:20 PST 2005
On Thu, Nov 10, 2005 at 10:13:19AM +1100, Erik de Castro Lopo wrote:
> a) There is no marker to distinguish little endian data
> from big endian data.
The original reason for this is because Ogg makes such a matter moot,
since the bitpacker in libogg2 handles endian.. however, if a "chunk"
packer is made available (similar to memcpy), this becomes important
since we'll want to copy the data in which ever endian it already is.
Does endian vary widely for raw audio codecs, or would it be reasonable
to settle on one standard and expect all codecs to convert to the
correct endian which don't comply with the "norm"? If most hardware
supports one endian or another, I say we should stick to that, since
that's what the codec plugins would export anyway.
> b) There is no mention of audio data being help in double
> precision (64 bit) floating point. Current this is
> supported in libsndfile by WAV, AIFF, AU, IRCAM and the
> two different Matlab/Octave file formats (I may also
> have overlooked some).
The bits per sample field covers this. Set this to "64" and set the
data type to "float" and it "should just work"...
> c) I think having separate fields for things like signed/
> unsigned/float and bit width is a mistake. I would suggest
> instead a single field that encodes all this information
> in a enumeration. Ie:
> OGG_PCM_U8 /* Unsigned 8 bit */
> OGG_PCM_S8 /* Signed 8 bit. */
> and so on. This scheme makes it very difficult to get
> signed/unsigned and bitwith messed up.
> d) Don't bother implementing unsigned PCM for bit widths
> greater than 8 bits. No other common file format uses
> it and those unsigned formats are a pain to work with.
Problem with this is inflexibility. See, not ever application must
support every possible combination of formatting - in fact, many will
require a very small set of parameters going in, ie, "it must be float
of 16, 24, 32, or 64 bit" or "it must be 16 or 24 bit signed".
Implementors will never, very likely, implement 32-bit unsigned int, and
that is not an issue. If some fool does, his data will simply not be
accessable to any other codec or application unless he writes a
conversion plugin, which in essence, treats the two sides (from
OggStream's perspective) as two entirely different codecs, even if both
are in OggPCM format.
The flexibility of this does, though, encourage stuff like 96bit audio.
Anyone implementing a codec which uses this, and import/exports it, will
also write the appropriate conversion OggStream plugin which will allow
applications which only support, say, 16bit audio, to work with it.
I guess you could chalk this up to an inherit difference in philosophy
and purpose between OggPCM and RIFF/WAVE (.wav).. theirs is as much an
interchange format as a storage codec, where OggPCM isn't really
intended for storage. FLAC (Free Lossless Audio Codec) limits to a
certain number of formats, and all decoders can decode these formats,
and it's well suited for storage as a /compressed/ lossless codec..
As primarily an interchange codec, if you have some rare or new format
being imported/exported from your new codec, you had better also make
sure it can itself support more common formats (ie, 44100/16/2) or that
you include a conversion plugin which does that for your users.
> f) Encoding of channel information. In a two channel file,
> is the audio data a stereo image or two distinct mono
> channels? For a file with N (> 2) channels, are there
> pairs of channels which should be considered as a stereo
> pairs or do you want to place these stereo pairs as
> separate streams within a single ogg container? What
> about multi channel surround sound (there are a number
> of different formats like 5.1 and 7.1) or quadraphonic?
> How are you going to specify which channel is which.
> Being able to encode this stuff easily is **vital**.
I agree - this is something that wasn't on my radar until this morning
when MikeS was asking about the channel layout in Vorbis/FLAC. How
would you suggest this data be included in the binary header? I
honestly have no experience with anything other than mono and stereo.
It should all be in the same stream.
> g) With things like surround sound, are you going to allow
> 24 bit audio for the main stereo pair and 16 bits for
> the side channels? This might best be achieved using
> separate stream, but that would make channel information
> all that more important. Is it useful to have PCM for the
> main stereo pair and say vorbis encoding for the side
Do people really do such things as encode different channels with
different sample sizes (and, I assume, samplerates)?
I'd really like to prefer keeping a fixed samplesize/samplerate for all
channels. I really doubt any Ogg audio codec is going to get that
complicated anytime soon, and if it's really needed, a codec plugin
/could/ be fed/provide packets from multiple OggPCM bitstreams, just
like how a+v codecs (ie, DV) would import/export OggPCM+OggYUV.
Is there anything else you've thought of that we've missed?
The recognition of individual possibility,
to allow each to be what she and he can be,
rests inherently upon the availability of knowledge;
The perpetuation of ignorance is the beginning of slavery.
from "Die Gedanken Sind Frei": Free Software and the Struggle for Free Thought
by Eben Moglen, General council of the Free Software Foundation
More information about the ogg-dev