[opus] Channel Mapping Family for Ambisonics

Tue Apr 19 17:19:12 UTC 2016

Michael Graczyk wrote:
> Based on my reading of the libopus code and the IETF spec, it seems one
> reasonable option would be to add a new "Channel Mapping Family" for
> ambisonic audio [2]. The mapping family would indicate to the decoder
> that the audio is ambisonics and the channel mapping array would
> indicate which ambisonic channel (W, X, Y, etc) corresponds to which
> coded stream. This representation is analogous to Opus headers for
> surround sound.

Yes, this all sounds good.

> There are a few caveats though. Although we believe we can achieve good
> compression at first without changing the bitstream or the decoder, we
> would like the flexibility to potentially modify both if potential
> improvements are compelling enough to impress you (specifically, we have
> a pre/post transform that would require sending compressible side
> information). Would changing either the bitstream or encoder require
> adding yet another channel mapping? Would it require a new Opus version
> number?

I'm assuming that the pre/post transform would be something like a 
decorrelating transform on the channels, and the actual core parts of 
Opus would not change. For that, I think an extra channel mapping is a 
reasonable approach, without requiring a new version number.

We've talked about using the Opus padding as a place to store extra side 
information in the past. I haven't thought deeply about the trade-offs 
between that and other approaches, like using an invalid TOC sequence to 
add additional packet data, as we did with the MPEG TS embedding 
(<https://wiki.xiph.org/OpusTS>).

The difficulty with padding is that it prevents transparent 
repacketization, because the padding occurs once per Opus packet, while 
side information is typically required once per Opus frame (of which 
there can be several in an Opus packet). The invalid TOC sequence 
approach would put the data in its own Opus packet entirely, which might 
have other drawbacks.

> To summarize, should we add a new channel mapping for ambisonics? If
> not, what should we do?

That seems like the best approach. See also 
<https://www.iana.org/assignments/opus-channel-mapping-families/opus-channel-mapping-families.xhtml>. 
As long as there is a specification somewhere (this does not have to be 
an IETF specification, though it could be), we can add it to this list.