[Vorbis] Proposal for Ambisonics format in vorbis comment.

Thu Nov 26 12:14:14 PST 2015

Greetings,

I apologize if I posted this in the wrong list, I wasn't sure where to post it, but seeing as the tags are called "vorbis comments" I thought vorbis, rather than ogg-dev, would be the right choice. (actually, I'm not even a developer anyway)

What I'd like to propose is a simple way to encode ambisonic files in vorbis comments as simple tags. By this I don't mean a single change to the format itself or the codec, but a simple "official" tag so that hopefully, in the future, we'll have decoders complying with it. Nobody ever wants to take ambisonic storage off the ground in an *universal* fashion because there's no standard in encoding the *channel orderings*, what *channels are present*, and the *normalization*, and people don't agree on one thing for some reason. (perhaps being stubborn)

My proposal is different because it solves all issues: it allows only Pantophonic (or planar/2D) signals if you wish, as probably most music and people will not even have a 3D system which includes height... at the same time you can specify a full 3D sphere encoding, or somewhere in between. The former is especially important because it needs far less number of channels and thus consumes far less amount of space and bandwidth, so instead the order of ambisonic or quality of the audio itself can be increased.

Note that this proposal is *infinitely* extensible to an arbitrary "ambisonic order", *and* it can specify the normalization. I haven't decided on the default normalization scheme, I'd like it to be N3D (why? well, just because? none is objectively superior but we have to agree on *something* for a standard) but it doesn't really matter as it can be specified.

Basically, it uses the ACN channel ordering described here: http://ambisonics.ch/standards/channels/ (it is mathematically defined by the relationship l*(l+1) + m; where l is the mathematical degree, and m is the mathematical order). (note that in ambisonics jargon, the 'order' of ambisonics actually refers to the mathematical degree)

However the filetypes are described here: http://ambisonics.ch/standards/filetypes/

(Please note I have no affiliation with that site, I just found it and it is the best way to describe ambisonics material)

This allows us to *uniquely* identify the channels used without wasting space on empty channels at all. Because you specify both the "degree" of the Pantophony and the "degree" of the height individually. The value (3,0) would thus mean "third order ambisonics pantophony" having channels 0,1,3,4,8,9,15 present with no height component at all because it is degree 0 for height, which means a 2D/Planar signal requiring *only* 7 channels instead of 16! Of course if you wanted a full-sphere 3D field, then you'd use (3,3) and get all 16 channels in the file. Lowering the second degree simply lowers the "order" or "resolution" of the height component.

The important thing to remember is that by just these two values, the decoder knows *exactly* which channels are present and in what order, because they are defined precisely from it. No empty channels that waste space and bandwidth. Plus, the decoder is not confused as it knows exactly how and which channels and in what order they are present (there are only 7 in the 2D case).

The "way" to calculate which channels are present is easy enough if you look at the first link which describes the full channel orderings (ACNs). For a 2D planar case, you simply take, for each degree, the 'extremities' where m is -l and +l, and only use those channels. For example, "third order" (3rd degree) planar has the channels with:

Degree 0: m=0 -> ACN 0
Degree 1: m=-1 and +1 -> ACN 1,3  (refer to the table which is built from that math relationship)
Degree 2: m=-2 and +2 -> ACN 4,8
Degree 3: m=-3 and +3 -> ACN 9,15

Thus combining all of them up we have 0,1,3,4,8,9,15 our 7 channels! This is what is actually present in the file itself (the 7 channels), but the decoder knows where and how to decode and map them from just that. You can extend this to arbitrary orders and degrees.

If you increase the second (height) degree, you simply add all the channels for that degree. A (3,1) for instance will take all the missing channels from degrees 0 and 1. Since we didn't skip any channels from degree 0, and we only skipped one channel from degree 1 (where m=0; we only took m=-1 and m=1), then we just add that channel where l=1 and m=0 -> ACN 2. Thus for (3,1) we get 0,1,2,3,4,8,9,15  (8 channels in the file), and it *uniquely* identifies the channel ordering like this, zero ambiguity.

For (3,2) we'd add the channels in degree 2 that we missed (except for m=-2 and m=2), thus we add channels corresponding to l=2 and m=-1,0,1, thus ACN 5,6,7. Thus (3,2) has the 0,1,2,3,4,5,6,7,8,9,15 channels (11 channels in the file).

If you do (3,3) you end up with all channels for all 3 degrees, so all 0...15 channels. I hope you get it, it's easy enough to understand and no ambiguity whatsoever.

The last thing to add is the normalization which I think can simply be added after a colon. Thus finally, my proposal would be to add tag like this as a vorbiscomment:

AMBISONIC=(3,0):N3D

The above tag defines a 2D planar file with "third order ambisonics" and no height at all, using the N3D normalization scheme. Thus, when a decoder sees this, it knows this file has 7 channels and they are ACN 0,1,3,4,8,9,15. The following tag:

AMBISONIC=(3,3):SN3D

defines a full-sphere 3D field using the SN3D normalization scheme. When the decoder sees it, it knows the file has 16 channels, them being ACN 0...15. (of course the decoder can refuse to decode if it cannot! that's beside the point!)

Would acknowledging such a tag as official format be much trouble and to be added to the spec?

I simply want an *official* way to send this very simple information requiring no more than just two values and the normalization scheme and store it in a file. I already use this tag format on my things right now (unreleased because I need to know it is the best way) because I really want to take Ambisonics off the ground (even for music which is what I do). I want it officially because then decoders will hopefully be made to comply with it. Alone, I have no power to influence that, sadly, so I turned to you.

I need your help here. This can work in FLAC too with vorbiscomments. Maybe other formats will follow if they see this take off. And if possible it should work on any other format that can specify tags, like Opus, I just need the official recognition. There is zero change in the codec itself or the format, it's just an officially recognized tag in a way declared in the spec, so decoders can know how to comply. Please if you do take this to heart, and decide to implement it, feel free to describe it in much better detail or technical terms as needed. I just wanted to explain it in an easy to understand manner.

If you have an alternative way to do this officially or a superior method (but this one proposed has *zero* shortcomings as far as storing ambisonic material is concerned that I'm aware of), please tell me so I will use it instead! Even if rejected, I will continue to use it just because I want to see it off the ground. I truly hope you'll consider it as an official tag format (I will encourage its use if so).

Thank you for your time and once again I am sorry if I mailed this in the wrong section, as this isn't necessarily about the codec, but I did not know where to put it (because I'm not a developer).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.xiph.org/pipermail/vorbis/attachments/20151126/6e0059ff/attachment.htm