[ogg-dev] OggPCM2: channel map
decoy at iki.fi
Thu Nov 17 04:54:52 PST 2005
On 2005-11-17, Erik de Castro Lopo wrote:
> I did flesh out the wiki a **little** more. Is the intent clearer now?
Yes. Channel map type tells us what the primary interpretation of the
stored signals is. Channel definitions are there to tell which stored
channel corresponds to which abstract channel in the type. Channel
conversions define downmixes to secondary formats, as they do in MLP,
and might end up being ignored unlike the channel map.
In theory the channel conversion header suffices for compatibility
coding, but in practice I'm not quite sure that the primary target of
such codings -- legacy players -- will implement the feature. In that
case the compatibility might prove illusory.
I'm also not entirely sure that the coding chosen for the channel
definitions is the best one. Typically we'd expect each type of channel
map to contain all and nothing but the channel definitions typically
used with that map type, in some order. For example L, C, R, Ls, Rs and
LFE for 5.1. If so, all we're really trying to encode is the
interleaving order. After that we have to ask whether that option is
really necessary (fixed channel orders are a real possibility,
especially since we're not encapsulating an existing format but defining
a new streamable one which will necessitate some copying around in any
case, and because some unnecessary options were already dropped for
simplicity; plus of course the channel conversion headers enable channel
permutations as well) and whether this is the best encoding for it
(permutations can be coded with less redundancy and room for error). If
the idea is to enable subsetting (e.g. 5.1 with a missing LFE equals
5.0) then something like WAVE's channel mask seems a better alternative.
The format also doesn't stop us from defining two left channels for
stereo, while it does seem to be trying to limit possibilities of error
by defining the channel types separately for each map (e.g. no
OGG_CHANNEL_AMBISONIC_W inside a stereo channel map). Unfortunately, in
the process it could end up with combinatorial explosion in the channel
type enumerations (i.e. we might end up redefining L, R, C, etc. for
each multichannel map type, of which there are a lot).
So, how about a slight change in emphasis? Currently we have two types
of channel semantics headers, one for the primary interpretation of the
stream and one for downmixing to secondary formats. Why not redefine
them so that both are bona fide channel maps and many such maps are
allowed (say, in descending preferential order), but only one type comes
with a conversion matrix (can't handle linear algebra? just skip to the
next map; matrices can also implement arbitrary channel selections and
permutations so in this case a separate channel map is not needed) and
each map carries an assignment array with the precise number of channels
the map expects (6 for 5.1, 2 for stereo, etc.), used to refer to the
physical channels by order number. In pseudocode,
(n_chn:=3; // three channels are stored
(map_type:=simple; // no matrix, the most preferred choice
channel_type:=MAP_UHJ; // implies that the map has N_UHJ==4 entries
map[UHJ_SIGMA]:=1; // SIGMA is stored in the first physical channel
map[UHJ_Q]:=0;), // the fourth channel is not physically present
(map_type:=complex; // matrixing needed to go from m/s to l/r
matrix:= // dimensions come from n_chn and N_STEREO
(1, 1, // run of the mill sum/diff matrix
channel_type:=MAP_MONO; // mono fallback; support could be mandatory
map[MONO_FRONT]:=1;) // seems stupid but comes in handy if the
// single mono compatible channel happens
// to not be the first one stored
Decoding such a structure is trivial: just skip to the first map you
understand. Simple decoders need to know nothing about matrices, but
compatibility encodings will still work. Some stupid assignments can
still be made, but not as easily. If we want to be even stricter, we can
drop the channel map from simple encodings and require a fixed channel
order in this case; this would ease up implementation (cf. your comments
on generating code on the fly). No functionality is lost, evenwhile it
can be argued that the structure is simplified conceptually. Unknown
primary interpretations (say, channel maps with angle-elevation
specified sources) can be added without compromising compatibility (they
will simply be skipped whereas in the current format they would cause
naïve decoders to reject the file). How does it sound?
>> (Beware of the pet peeve...)
> What is that pet peeve?
Umm... Roughly file formats which prove rigid in practice even after
they've been declared extensible.
> I haven't enumerated them all, but we should be able to without too
> much trouble,
Want me to start a list in the Wiki?
> Do you have any more info about THX? I've searched the web and found
> little of any worth.
I used to have, but I may have misplaced the specs. The main idea is
that in THX, the surround channel is supposed to be spatially diffuse.
It is not recreated with directional sources at the back, but by
utilizing dipole speakers, room reflections, multiple sources or even
explicit allpass decorrelators. But I'm not quite sure what the overall
spatial distribution of the surround field is supposed to be. Hopefully
I can find something concrete on it.
Sampo Syreeni, aka decoy - mailto:decoy at iki.fi, tel:+358-50-5756111
student/math+cs/helsinki university, http://www.iki.fi/~decoy/front
openpgp: 050985C2/025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2
More information about the ogg-dev