From octarone at yahoo.com Thu Nov 26 12:14:14 2015 From: octarone at yahoo.com (Gabriel I.) Date: Thu, 26 Nov 2015 20:14:14 +0000 (UTC) Subject: [Vorbis] Proposal for Ambisonics format in vorbis comment. References: <672491866.11375961.1448568854714.JavaMail.yahoo.ref@mail.yahoo.com> Message-ID: <672491866.11375961.1448568854714.JavaMail.yahoo@mail.yahoo.com> Greetings, I apologize if I posted this in the wrong list, I wasn't sure where to post it, but seeing as the tags are called "vorbis comments" I thought vorbis, rather than ogg-dev, would be the right choice. (actually, I'm not even a developer anyway) What I'd like to propose is a simple way to encode ambisonic files in vorbis comments as simple tags. By this I don't mean a single change to the format itself or the codec, but a simple "official" tag so that hopefully, in the future, we'll have decoders complying with it. Nobody ever wants to take ambisonic storage off the ground in an *universal* fashion because there's no standard in encoding the *channel orderings*, what *channels are present*, and the *normalization*, and people don't agree on one thing for some reason. (perhaps being stubborn) My proposal is different because it solves all issues: it allows only Pantophonic (or planar/2D) signals if you wish, as probably most music and people will not even have a 3D system which includes height... at the same time you can specify a full 3D sphere encoding, or somewhere in between. The former is especially important because it needs far less number of channels and thus consumes far less amount of space and bandwidth, so instead the order of ambisonic or quality of the audio itself can be increased. Note that this proposal is *infinitely* extensible to an arbitrary "ambisonic order", *and* it can specify the normalization. I haven't decided on the default normalization scheme, I'd like it to be N3D (why? well, just because? none is objectively superior but we have to agree on *something* for a standard) but it doesn't really matter as it can be specified. Basically, it uses the ACN channel ordering described here: http://ambisonics.ch/standards/channels/ (it is mathematically defined by the relationship l*(l+1) + m; where l is the mathematical degree, and m is the mathematical order). (note that in ambisonics jargon, the 'order' of ambisonics actually refers to the mathematical degree) However the filetypes are described here: http://ambisonics.ch/standards/filetypes/ (Please note I have no affiliation with that site, I just found it and it is the best way to describe ambisonics material) This allows us to *uniquely* identify the channels used without wasting space on empty channels at all. Because you specify both the "degree" of the Pantophony and the "degree" of the height individually. The value (3,0) would thus mean "third order ambisonics pantophony" having channels 0,1,3,4,8,9,15 present with no height component at all because it is degree 0 for height, which means a 2D/Planar signal requiring *only* 7 channels instead of 16! Of course if you wanted a full-sphere 3D field, then you'd use (3,3) and get all 16 channels in the file. Lowering the second degree simply lowers the "order" or "resolution" of the height component. The important thing to remember is that by just these two values, the decoder knows *exactly* which channels are present and in what order, because they are defined precisely from it. No empty channels that waste space and bandwidth. Plus, the decoder is not confused as it knows exactly how and which channels and in what order they are present (there are only 7 in the 2D case). The "way" to calculate which channels are present is easy enough if you look at the first link which describes the full channel orderings (ACNs). For a 2D planar case, you simply take, for each degree, the 'extremities' where m is -l and +l, and only use those channels. For example, "third order" (3rd degree) planar has the channels with: Degree 0: m=0 -> ACN 0 Degree 1: m=-1 and +1 -> ACN 1,3? (refer to the table which is built from that math relationship) Degree 2: m=-2 and +2 -> ACN 4,8 Degree 3: m=-3 and +3 -> ACN 9,15 Thus combining all of them up we have 0,1,3,4,8,9,15 our 7 channels! This is what is actually present in the file itself (the 7 channels), but the decoder knows where and how to decode and map them from just that. You can extend this to arbitrary orders and degrees. If you increase the second (height) degree, you simply add all the channels for that degree. A (3,1) for instance will take all the missing channels from degrees 0 and 1. Since we didn't skip any channels from degree 0, and we only skipped one channel from degree 1 (where m=0; we only took m=-1 and m=1), then we just add that channel where l=1 and m=0 -> ACN 2. Thus for (3,1) we get 0,1,2,3,4,8,9,15? (8 channels in the file), and it *uniquely* identifies the channel ordering like this, zero ambiguity. For (3,2) we'd add the channels in degree 2 that we missed (except for m=-2 and m=2), thus we add channels corresponding to l=2 and m=-1,0,1, thus ACN 5,6,7. Thus (3,2) has the 0,1,2,3,4,5,6,7,8,9,15 channels (11 channels in the file). If you do (3,3) you end up with all channels for all 3 degrees, so all 0...15 channels. I hope you get it, it's easy enough to understand and no ambiguity whatsoever. The last thing to add is the normalization which I think can simply be added after a colon. Thus finally, my proposal would be to add tag like this as a vorbiscomment: AMBISONIC=(3,0):N3D The above tag defines a 2D planar file with "third order ambisonics" and no height at all, using the N3D normalization scheme. Thus, when a decoder sees this, it knows this file has 7 channels and they are ACN 0,1,3,4,8,9,15. The following tag: AMBISONIC=(3,3):SN3D defines a full-sphere 3D field using the SN3D normalization scheme. When the decoder sees it, it knows the file has 16 channels, them being ACN 0...15. (of course the decoder can refuse to decode if it cannot! that's beside the point!) Would acknowledging such a tag as official format be much trouble and to be added to the spec? I simply want an *official* way to send this very simple information requiring no more than just two values and the normalization scheme and store it in a file. I already use this tag format on my things right now (unreleased because I need to know it is the best way) because I really want to take Ambisonics off the ground (even for music which is what I do). I want it officially because then decoders will hopefully be made to comply with it. Alone, I have no power to influence that, sadly, so I turned to you. I need your help here. This can work in FLAC too with vorbiscomments. Maybe other formats will follow if they see this take off. And if possible it should work on any other format that can specify tags, like Opus, I just need the official recognition. There is zero change in the codec itself or the format, it's just an officially recognized tag in a way declared in the spec, so decoders can know how to comply. Please if you do take this to heart, and decide to implement it, feel free to describe it in much better detail or technical terms as needed. I just wanted to explain it in an easy to understand manner. If you have an alternative way to do this officially or a superior method (but this one proposed has *zero* shortcomings as far as storing ambisonic material is concerned that I'm aware of), please tell me so I will use it instead! Even if rejected, I will continue to use it just because I want to see it off the ground. I truly hope you'll consider it as an official tag format (I will encourage its use if so). Thank you for your time and once again I am sorry if I mailed this in the wrong section, as this isn't necessarily about the codec, but I did not know where to put it (because I'm not a developer). -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.xiph.org/pipermail/vorbis/attachments/20151126/6e0059ff/attachment.htm From martin.leese at stanfordalumni.org Mon Nov 30 12:57:23 2015 From: martin.leese at stanfordalumni.org (Martin Leese) Date: Mon, 30 Nov 2015 13:57:23 -0700 Subject: [Vorbis] Proposal for Ambisonics format in vorbis comment. Message-ID: "Gabriel I." wrote: > Greetings, > > I apologize if I posted this in the wrong list, I wasn't sure where to post > it, but seeing as the tags are called "vorbis comments" I thought vorbis, > rather than ogg-dev, would be the right choice. (actually, I'm not even a > developer anyway) Hi Gabriel, I doubt whether the Xiph community would promote a file format for Ambisonics without first seeing whether it had the support of the Ambisonic community *and* seeing it used in the wild. The Ambisonic crowd all hang out on the sursound list,(1) so you should post your proposal over there. (Links have been collected together at the end.) However, there was a heated discussion in August/September 2008 on that list about a new file format. Despite several hundred posts, no consensus emerged. My guess is that nobody over there has the stomach for another round (I know I don't). This also might explain why you received no replies when you posted your proposal on the Ambisonics list.(2) (So many lists.) I have interspersed some comments below. If you do not understand anything, feel free to e-mail me off list. > What I'd like to propose is a simple way to encode ambisonic files in vorbis > comments as simple tags. By this I don't mean a single change to the format > itself or the codec, but a simple "official" tag so that hopefully, in the > future, we'll have decoders complying with it. Nobody ever wants to take > ambisonic storage off the ground in an *universal* fashion because there's > no standard in encoding the *channel orderings*, what *channels are > present*, and the *normalization*, and people don't agree on one thing for > some reason. (perhaps being stubborn) I was a little surprised that you did not discuss the ".amb" format as this is the official file format for Ambisonic B-Format.(3) This has also been used in the wild for many years, particularly at Ambisonia.com.(4) It uses Furse-Malham component ordering and MaxN normalization. (It also only works up to third-order.) Note that Vorbis is lossy. Ambisonics is picky about low-frequency phase and, as far as I know, nobody has checked the extent to which this is preserved in Vorbis. This may or may not be a problem. (It is obviously not a problem with lossless FLAC.) Your proposal can utilize all 256 Vorbis channels, which is good; not all proposals for Ambisonics in Vorbis have allowed this. > My proposal is different because it solves all issues: it allows only > Pantophonic (or planar/2D) signals if you wish, as probably most music and > people will not even have a 3D system which includes height... at the same > time you can specify a full 3D sphere encoding, or somewhere in between. The > former is especially important because it needs far less number of channels > and thus consumes far less amount of space and bandwidth, so instead the > order of ambisonic or quality of the audio itself can be increased. The "*.amb" format has the same property, so your proposal is not *that* different. > Note that this proposal is *infinitely* extensible to an arbitrary > "ambisonic order", *and* it can specify the normalization. I haven't decided > on the default normalization scheme, I'd like it to be N3D (why? well, just > because? none is objectively superior but we have to agree on *something* > for a standard) but it doesn't really matter as it can be specified. The ".amb" format is limited to third-order and uses MaxN normalization (with the exception of a -3dB correction factor for W).(5) Unfortunately, the coefficicents for the latter cannot be specified algebraically above third-order (but they can be calculated numerically). Dumping this normalization is therefore probably a good thing. > Basically, it uses the ACN channel ordering described here: > http://ambisonics.ch/standards/channels/ (it is mathematically defined by > the relationship l*(l+1) + m; where l is the mathematical degree, and m is > the mathematical order). (note that in ambisonics jargon, the 'order' of > ambisonics actually refers to the mathematical degree) The ".amb" format uses Furse-Malham channel ordering.(6) This is complicated and counter-intuitive, and was used only for compatibility with (then) current practice. Dumping it is therefore probably a good thing. > However the filetypes are described here: > http://ambisonics.ch/standards/filetypes/ > > (Please note I have no affiliation with that site, I just found it and it is > the best way to describe ambisonics material) > > This allows us to *uniquely* identify the channels used without wasting > space on empty channels at all. Because you specify both the "degree" of the > Pantophony and the "degree" of the height individually. The value (3,0) > would thus mean "third order ambisonics pantophony" having channels > 0,1,3,4,8,9,15 present with no height component at all because it is degree > 0 for height, which means a 2D/Planar signal requiring *only* 7 channels > instead of 16! Of course if you wanted a full-sphere 3D field, then you'd > use (3,3) and get all 16 channels in the file. Lowering the second degree > simply lowers the "order" or "resolution" of the height component. > > The important thing to remember is that by just these two values, the > decoder knows *exactly* which channels are present and in what order, > because they are defined precisely from it. No empty channels that waste > space and bandwidth. Plus, the decoder is not confused as it knows exactly > how and which channels and in what order they are present (there are only 7 > in the 2D case). Your proposal for mixed order is the same as the ".amb" format. It has the disadvantage that as a source leaves the horizontal, its sharpness degrades rapidly to that of the height-order. An alternative scheme, which does not have this problem, is "Complete mixed-order sets".(7) However, I don't know of anybody who has experience with decoding such sets. ... > The last thing to add is the normalization which I think can simply be added > after a colon. Thus finally, my proposal would be to add tag like this as a > vorbiscomment: > > AMBISONIC=(3,0):N3D > > The above tag defines a 2D planar file with "third order ambisonics" and no > height at all, using the N3D normalization scheme. Thus, when a decoder sees > this, it knows this file has 7 channels and they are ACN 0,1,3,4,8,9,15. The > following tag: > > AMBISONIC=(3,3):SN3D > > defines a full-sphere 3D field using the SN3D normalization scheme. When the > decoder sees it, it knows the file has 16 channels, them being ACN 0...15. > (of course the decoder can refuse to decode if it cannot! that's beside the > point!) > > Would acknowledging such a tag as official format be much trouble and to be > added to the spec? Adding new VorbisComment tags to the Vorbis spec does not happen lightly; Xiph has an official policy of neglect with respect to tags. Back in July 2009 Xiph *asked* me to survey what tags were being used in the wild, and to propose additions to the Vorbis spec.(8) Even these were not added. (Me, bitter and twisted? Never!) > I simply want an *official* way to send this very simple information > requiring no more than just two values and the normalization scheme and > store it in a file. I already use this tag format on my things right now > (unreleased because I need to know it is the best way) because I really want > to take Ambisonics off the ground (even for music which is what I do). I > want it officially because then decoders will hopefully be made to comply > with it. Alone, I have no power to influence that, sadly, so I turned to > you. > > I need your help here. This can work in FLAC too with vorbiscomments. Maybe > other formats will follow if they see this take off. And if possible it > should work on any other format that can specify tags, like Opus, I just > need the official recognition. There is zero change in the codec itself or > the format, it's just an officially recognized tag in a way declared in the > spec, so decoders can know how to comply. Please if you do take this to > heart, and decide to implement it, feel free to describe it in much better > detail or technical terms as needed. I just wanted to explain it in an easy > to understand manner. Note that Native FLAC is limited to 8 channels. Obviously you could include two or more FLAC streams in an Ogg container (Ogg FLAC), and so have an unlimited number of channels. However, in this case, the metadata should not be in the FLAC stream(s) but in the Ogg container. There is no simple way to do this, but possibilities include name-value pairs in an Ogg Skeleton stream and a XMLEmbedding stream.(9) Finally, your proposal only considers Ambisonic B-Format. UHJ and G-Format are also part of Ambisonics. There is an official file format for UHJ,(10) and a proposal for G-Format.(11) UHJ could be accommodated into your proposal quite simply using the VorbisComment "AMBISONIC:UHJ". G-Format is a lot more complicated, and should probably be ignored. > If you have an alternative way to do this officially or a superior method > (but this one proposed has *zero* shortcomings as far as storing ambisonic > material is concerned that I'm aware of), please tell me so I will use it > instead! Even if rejected, I will continue to use it just because I want to > see it off the ground. I truly hope you'll consider it as an official tag > format (I will encourage its use if so). > > Thank you for your time and once again I am sorry if I mailed this in the > wrong section, as this isn't necessarily about the codec, but I did not know > where to put it (because I'm not a developer). Regards, Martin (1) https://mail.music.vt.edu/mailman/listinfo/sursound (2) http://ambisonics.ch/mailman/listinfo/ambisonics (3) http://members.tripod.com/martin_leese/Ambisonic/B-Format_file_format.html (4) http://www.ambisonia.com/ (5) https://en.wikipedia.org/wiki/Ambisonic_data_exchange_formats#maxN (6) https://en.wikipedia.org/wiki/Ambisonic_data_exchange_formats#Furse-Malham (7) https://en.wikipedia.org/wiki/Mixed-order_Ambisonics#Complete_mixed-order_sets_.28.23H.23V.29 (8) https://wiki.xiph.org/Field_names (9) https://wiki.xiph.org/Metadata#Ogg_Skeleton (10) http://members.tripod.com/martin_leese/Ambisonic/UHJ_file_format.html (11) http://members.tripod.com/martin_leese/Ambisonic/G-Format_chunk.html -- Martin J Leese E-mail: martin.leese stanfordalumni.org Web: http://members.tripod.com/martin_leese/ From ibmalone at gmail.com Mon Nov 30 16:26:29 2015 From: ibmalone at gmail.com (Ian Malone) Date: Tue, 1 Dec 2015 00:26:29 +0000 Subject: [Vorbis] Proposal for Ambisonics format in vorbis comment. In-Reply-To: References: Message-ID: On 30 November 2015 at 20:57, Martin Leese wrote: > "Gabriel I." wrote: > >> Greetings, >> Ol?. Hope everyone is well, thought I'd interject. >> I apologize if I posted this in the wrong list, I wasn't sure where to post >> it, but seeing as the tags are called "vorbis comments" I thought vorbis, >> rather than ogg-dev, would be the right choice. (actually, I'm not even a >> developer anyway) > > Hi Gabriel, > > I doubt whether the Xiph community would > promote a file format for Ambisonics without > first seeing whether it had the support of the > Ambisonic community *and* seeing it used in > the wild. The Ambisonic crowd all hang out on > the sursound list,(1) so you should post your > proposal over there. (Links have been > collected together at the end.) However, there > was a heated discussion in August/September > 2008 on that list about a new file format. > Despite several hundred posts, no consensus > emerged. My guess is that nobody over there > has the stomach for another round (I know I > don't). This also might explain why you > received no replies when you posted your > proposal on the Ambisonics list.(2) (So many > lists.) > >> >> Would acknowledging such a tag as official format be much trouble and to be >> added to the spec? > > Adding new VorbisComment tags to the Vorbis > spec does not happen lightly; Xiph has an > official policy of neglect with respect to tags. > Back in July 2009 Xiph *asked* me to survey > what tags were being used in the wild, and to > propose additions to the Vorbis spec.(8) Even > these were not added. (Me, bitter and twisted? > Never!) > While that list wasn't officially added, it does see use. One thing that did get acceptance (to some degree) was the METADATA_BLOCK_PICTURE (and I've seen others of the proposed list in use), this was probably because there was interest from a developer in actually using it. I think that's key. Apple or Microsoft (and google to some extent) can just put a new feature in their next version of a product and automatically most of their user base is using it. For open formats it's a bit different, I think the important thing for someone wanting to make this happen (not knowing the ambisonics world) would be to find a developer behind a commonly used system and get them interested. As to the details, an oggskeleton and an embedded metadata stream (maybe XML, maybe binary) would be the purest way of doing it. But you might find it's easier to sell a comment based one. The overwhelming explosion of bad unstructured metadata (like early mp3), which I think probably led Xiph to be cautious about comment contents early on never really made it to Ogg and seems to have tailed off now that more media comes through 'official' channels. Either way good test samples are very useful, being able to prepare those is important. I guess ambisonics is probably a relatively small community? Do you need full hardware solutions or do people mainly drive hardware from open software? -- imalone http://ibmalone.blogspot.co.uk