[ogg-dev] OggPCM2: channel map

Thu Nov 17 04:54:52 PST 2005

On 2005-11-17, Erik de Castro Lopo wrote:

> I did flesh out the wiki a **little** more. Is the intent clearer now?

Yes. Channel map type tells us what the primary interpretation of the 
stored signals is. Channel definitions are there to tell which stored 
channel corresponds to which abstract channel in the type. Channel 
conversions define downmixes to secondary formats, as they do in MLP, 
and might end up being ignored unlike the channel map.

In theory the channel conversion header suffices for compatibility 
coding, but in practice I'm not quite sure that the primary target of 
such codings -- legacy players -- will implement the feature. In that 
case the compatibility might prove illusory.

I'm also not entirely sure that the coding chosen for the channel 
definitions is the best one. Typically we'd expect each type of channel 
map to contain all and nothing but the channel definitions typically 
used with that map type, in some order. For example L, C, R, Ls, Rs and 
LFE for 5.1. If so, all we're really trying to encode is the 
interleaving order. After that we have to ask whether that option is 
really necessary (fixed channel orders are a real possibility, 
especially since we're not encapsulating an existing format but defining 
a new streamable one which will necessitate some copying around in any 
case, and because some unnecessary options were already dropped for 
simplicity; plus of course the channel conversion headers enable channel 
permutations as well) and whether this is the best encoding for it 
(permutations can be coded with less redundancy and room for error). If 
the idea is to enable subsetting (e.g. 5.1 with a missing LFE equals 
5.0) then something like WAVE's channel mask seems a better alternative. 
The format also doesn't stop us from defining two left channels for 
stereo, while it does seem to be trying to limit possibilities of error 
by defining the channel types separately for each map (e.g. no 
OGG_CHANNEL_AMBISONIC_W inside a stereo channel map). Unfortunately, in 
the process it could end up with combinatorial explosion in the channel 
type enumerations (i.e. we might end up redefining L, R, C, etc. for 
each multichannel map type, of which there are a lot).

So, how about a slight change in emphasis? Currently we have two types 
of channel semantics headers, one for the primary interpretation of the 
stream and one for downmixing to secondary formats. Why not redefine 
them so that both are bona fide channel maps and many such maps are 
allowed (say, in descending preferential order), but only one type comes 
with a conversion matrix (can't handle linear algebra? just skip to the 
next map; matrices can also implement arbitrary channel selections and 
permutations so in this case a separate channel map is not needed) and 
each map carries an assignment array with the precise number of channels 
the map expects (6 for 5.1, 2 for stereo, etc.), used to refer to the 
physical channels by order number. In pseudocode,

#def MAP_MONO:=1;
#def N_MONO:=1;
#def MONO_FRONT:=1;

#def MAP_STEREO:=2;
#def N_STEREO:=2;
#def STEREO_L:=1;
#def STEREO_R:=2;

#def MAP_UHJ:=3;
#def N_UHJ:=4;
#def UHJ_SIGMA:=1;
#def UHJ_DELTA:=2;
#def UHJ_T:=3;
#def UHJ_Q:=4;

header:=
  (n_chn:=3;                    // three channels are stored
   maps:=
    (
     (map_type:=simple;         // no matrix, the most preferred choice
      channel_type:=MAP_UHJ;    // implies that the map has N_UHJ==4 entries
      map[UHJ_SIGMA]:=1;        // SIGMA is stored in the first physical channel
      map[UHJ_DELTA]:=2;
      map[UHJ_T]:=3;
      map[UHJ_Q]:=0;),          // the fourth channel is not physically present
     (map_type:=complex;        // matrixing needed to go from m/s to l/r
      channel_type:=MAP_STEREO;
      matrix:=                  // dimensions come from n_chn and N_STEREO
       (1,  1,                  // run of the mill sum/diff matrix
        1, -1,
        0,  0)),
     (map_type:=simple;
      channel_type:=MAP_MONO;   // mono fallback; support could be mandatory
      map[MONO_FRONT]:=1;)      // seems stupid but comes in handy if the
                                // single mono compatible channel happens
                                // to not be the first one stored
    )
  )

Decoding such a structure is trivial: just skip to the first map you 
understand. Simple decoders need to know nothing about matrices, but 
compatibility encodings will still work. Some stupid assignments can 
still be made, but not as easily. If we want to be even stricter, we can 
drop the channel map from simple encodings and require a fixed channel 
order in this case; this would ease up implementation (cf. your comments 
on generating code on the fly). No functionality is lost, evenwhile it 
can be argued that the structure is simplified conceptually. Unknown 
primary interpretations (say, channel maps with angle-elevation 
specified sources) can be added without compromising compatibility (they 
will simply be skipped whereas in the current format they would cause 
naïve decoders to reject the file). How does it sound?

>> (Beware of the pet peeve...)
>
> What is that pet peeve?

Umm... Roughly file formats which prove rigid in practice even after 
they've been declared extensible.

> I haven't enumerated them all, but we should be able to without too
> much trouble,

Want me to start a list in the Wiki?

> Do you have any more info about THX? I've searched the web and found 
> little of any worth.

I used to have, but I may have misplaced the specs. The main idea is 
that in THX, the surround channel is supposed to be spatially diffuse. 
It is not recreated with directional sources at the back, but by 
utilizing dipole speakers, room reflections, multiple sources or even 
explicit allpass decorrelators. But I'm not quite sure what the overall 
spatial distribution of the surround field is supposed to be. Hopefully 
I can find something concrete on it.
-- 
Sampo Syreeni, aka decoy - mailto:decoy at iki.fi, tel:+358-50-5756111
student/math+cs/helsinki university, http://www.iki.fi/~decoy/front
openpgp: 050985C2/025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2