[ogg-dev] OggPCM format description, rev 3

Sun Nov 13 18:18:47 PST 2005

> Unfortunately the ALSA API defines a number of formats which are
> in practice extremely rare. In particular, any unsigned int format
> larger than 8 bits. For instance, the only unsigned int type that
> libsndfile supports is unsigned 8 bit.

I expected this, it just seemed like a good starting point to get more
than 7 formats on the table. Specifically I wanted to the logarithmic
coding formats in there to make it clear this wasn't just for the integer
ones.

> I would also stringly advise against supporting **any** APDCM
> format. These things are a PITA to support and some cannot be
> supported without extending the header. For instance, microsoft's
> ADPCM  requires that a set of 8 coefficients require dor decoding
> be sent in the header. Most of the other ADPCM have blocks sizes
> that need to be sent. All in all this is a huge PITA. In comparison
> to FLAC, Speex and Vorbis, APDCM formats have little to offer.

No objection here. I'd like to see someone other than myself go through
and cull the list of formats into whatever a practical subset is. As long
as it does 16 bit signed little endian interleaved, I'll be happy.

> I still think that assigning meaning to bits within the format field
> is a mistake. Specifying bits like this could only be useful if
> you expect the decoder to generate code on the fly when it gets
> asked to decode say 16 bit, unsigned, little endian. Auto generated
> code that automagically supports all of these formats is significantly
> harder to write and debug than the equivalent set of single purpose
> decoders so I would suggest that this auto-magic stuff is a bad idea.
> Ergo, assigning meaning to the bits is a bad idea as well.

I'm fine with a straight enumeration.. I put the extra fields in there
more as a discussion point, saying "if you want to have some meaning here,
this is how I'd break it out." As I tried to make clear, I can't really
think of a good use for it. The only thing I can think of would be code
that extracts the 4 bytes for the sample then calls some other function
based on the coding type to convert it, but calling a function on every
sample is always a bad idea.

> Again, I strongly recommend against allowing non-interleaved data.
> It simply complicates everything far more than it needs to be.

This is probably the only point we may disagree on. Having the data
chunked opens the door for a whole host of SIMD optimized filters and it
definitely could be a useful internal representation along a filter chain.
As long as you're only dealing with byte aligned data, I don't think the
storage and retrieval is that difficult. I agree, it's probably not very
useful in the general case, but there are some cases where it is, so it
may be worth defining it.

I'm imagining the case of writing a command line filter chain, for instance:
 $ snd_capture | deinterlace | denoise | normalize | interlace | compress
(yes, we'll ignore the fact that you can't normalize in one pass...)

> Reserved for what? I can't possibly think what it could be used for.

These were the 6 bits of the format id I reserved for the storage size and
endianness, so that's what they could be used for. As for what an
application would use them for, I agree, I'm at a loss.

> This is a file header. Even under the most bloated scheme we could
> think of, its unlikely to be more than 100 bytes and it will be
> followed by hundreds of kilobytes at least of audio data. So why
> are we trying to conserve a couple of bits in the header?

Agreed.. I'm an embedded/dsp guy by trade, so these are the things I think
of. The comment I was trying to make is that reserving 30% of a word for
"this can't be described by these fields" is ugly, ugly, ugly.

John