[opus] [PATCH] opus-tools/opusfile: Support for Ambisonics

Fri May 25 23:43:16 UTC 2018

Drew Allen wrote:
> Friendly ping for supporting ambisonics in opus-tools & opusfile
> 
> Please LMK any ?s you might have.

Sorry for the long delay on the review. The high-level design looks a 
lot better. I have a number of comments on the implementation details.

I didn't really like the way that opus_head_projection_parse() turned 
out. I was originally thinking something more generic/future proof that 
could also be used for any other channel mapping families that want to 
define a custom channel mapping table. The way this is implemented in 
the patch you sent, it duplicates a lot of code from opus_head_parse(), 
the user still needs to call opus_head_parse() (so the work gets 
duplicated at runtime, too), and they have to pass in a buffer of 
exactly the right size (which means they need to know some details of 
the specific mapping family and how the demixing matrix works).

I think a slightly better approach is along the lines of the 
opus_head_parse_ext() I originally suggested, but just return a pointer 
into the user-supplied packet instead of copying the data. For the 
simple case where the user is just going to pass the data immediately to 
opus_projection_decoder_create(), they don't need a separate buffer at 
all, and the parameters to return the pointer and the matrix size can be 
NULLable to make them optional, so we can implement opus_head_parse() 
simply by calling opus_head_parse_ext() and passing NULL for them 
(eliminating all of the duplicate code and work).

Open question: Should opus_head_parse_ext() with NULL for the matrix 
and/or matrix size parameters succeed, or return an error (OP_EIMPL?) to 
let the application know that there is missing data they're not getting 
by using the old API (i.e., so they know they won't be able to decode 
that stream correctly)? If we do continue to return success, we should 
probably fill in the existing mapping[] array in OpusHead with something 
meaningful (e.g., 255 for all silence). Right now it is just copying a 
number of bytes out of the demixing matrix equal to the channel count, 
which doesn't make much sense.

There are a few errors in the opus_head_projection_parse() 
implementation. For mapping family 253 you don't validate that the 
packet has at least 21 bytes before reading bytes 19 and 20. Although 
you have bumped up OP_NCHANNELS_MAX to 18, 
op_validate_ambisonic_channels() will actually succeed for channel 
counts much larger than that. I didn't see anything that would then 
prevent us from trying to decode such streams (and overflowing the 
statically-sized buffers based on OP_NCHANNELS_MAX). I think this should 
return OP_EIMPL like we do for mapping family 255. Finally, I don't want 
to use sqrt() and introduce a libm dependency for the fixed-point 
builds, so we should probably extract the ambisonics order from the 
channel count a different way.

The next issue is that the demixing matrix cannot be stored directly in 
the OggOpusFile struct. In the case of a chained, seekable stream with 
multiple ambisonics projection links, we will parse all of the headers 
up front, and the matrix that gets stored will simply be the last one 
parsed, which will overwrite all of the others (and if they have 
different channel/stream counts, we could potentially use data from a 
mix of matrices of different sizes). The proper place to put this data 
is in the OggOpusLink structure, since there is one of these per link in 
a chained file. Since files can have a nearly unlimited number of links, 
and this matrix can be somewhat large, it probably makes sense to 
dynamically allocate it instead of using a fixed-size buffer, especially 
since most files probably won't be using ambisonics to begin with.

Once we have dynamic allocation for the demixing matrix, I wonder if we 
shouldn't just bump OP_NCHANNELS_MAX up to 227 now. Otherwise whenever 
someone does want to produce some of these higher-order ambisonics 
streams, everyone will have to upgrade libopusfile to be able to play them.

I also don't think you converted the check to see if we can re-use an 
existing decoder correctly. At a minimum it should ensure we want to 
create the same type of decoder and in the projection case, either 
verify the demixing matrices match or not attempt to re-use projection 
decoders (since this would require saving a copy of the demixing matrix 
to correctly handle the non-seekable case). I also think 
op_generic_decoder_init() should use op_use_projection() to decide which 
kind of decoder to create, to make sure it does not get out of sync with 
the rest of the code. At that point we're passing in enough pieces of an 
OggOpusLink that we should probably just pass the whole thing.

I think we can also get rid of some #ifdefs if we always define 
op_use_projection(), but make it return 0 without 
OPUS_HAVE_OPUS_PROJECTION_H.

Currently, it also looks like you are re-using the context variable 
decode_cb_ctx to save the user context for both the decode and 
projection_decode callbacks. So if you register both, the one you get 
passed depends on the order you register the callbacks, and gets passed 
to both callbacks. That behavior isn't documented in the API docs, and 
is a little confusing. I think the context used for each callback should 
be saved independently.

There are a number of other minor nits about making the formatting 
consistent with the rest of the codebase and the use of consistent names 
for things (i.e., opus_projection_decode vs. 
op_decode_projection_cb_func, mapping_family vs. channel_mapping, od vs. 
st, etc.).

Since some of the above points can be tricky (especially lifetime 
management for the dynamically allocated demixing matrices), enumerating 
all of the formatting nits is laborious, and because I wanted to keep 
things moving, I went ahead and addressed all of the above comments in a 
separate patch (attached). Please review, and if it looks okay, I'll 
commit a version squashed with your patch.

For the purposes of the open question above, for users of the old 
opus_parse_head() API I'm currently returning success when we have a 
projection matrix, but I'm strongly leaning towards returning OP_EIMPL 
instead. Let me know what you think about raising the channel limit, also.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Review-comments-for-ambisonics-support.patch
Type: text/x-patch
Size: 51777 bytes
Desc: not available
URL: <http://lists.xiph.org/pipermail/opus/attachments/20180525/b8ef08f4/attachment-0001.bin>