[opus] [PATCH] opus-tools/opusfile: Support for Ambisonics
bitllama at google.com
Mon Jun 4 16:08:39 UTC 2018
Thank you for taking the time to revise and improve my patch. I agree to
all the changes you've made here and if it suits you, I'm happy to sign off
for submitting the patch to opusfile.
On Fri, May 25, 2018 at 4:43 PM Timothy B. Terriberry <tterribe at xiph.org>
> Drew Allen wrote:
> > Friendly ping for supporting ambisonics in opus-tools & opusfile
> > Please LMK any ?s you might have.
> Sorry for the long delay on the review. The high-level design looks a
> lot better. I have a number of comments on the implementation details.
> I didn't really like the way that opus_head_projection_parse() turned
> out. I was originally thinking something more generic/future proof that
> could also be used for any other channel mapping families that want to
> define a custom channel mapping table. The way this is implemented in
> the patch you sent, it duplicates a lot of code from opus_head_parse(),
> the user still needs to call opus_head_parse() (so the work gets
> duplicated at runtime, too), and they have to pass in a buffer of
> exactly the right size (which means they need to know some details of
> the specific mapping family and how the demixing matrix works).
> I think a slightly better approach is along the lines of the
> opus_head_parse_ext() I originally suggested, but just return a pointer
> into the user-supplied packet instead of copying the data. For the
> simple case where the user is just going to pass the data immediately to
> opus_projection_decoder_create(), they don't need a separate buffer at
> all, and the parameters to return the pointer and the matrix size can be
> NULLable to make them optional, so we can implement opus_head_parse()
> simply by calling opus_head_parse_ext() and passing NULL for them
> (eliminating all of the duplicate code and work).
> Open question: Should opus_head_parse_ext() with NULL for the matrix
> and/or matrix size parameters succeed, or return an error (OP_EIMPL?) to
> let the application know that there is missing data they're not getting
> by using the old API (i.e., so they know they won't be able to decode
> that stream correctly)? If we do continue to return success, we should
> probably fill in the existing mapping array in OpusHead with something
> meaningful (e.g., 255 for all silence). Right now it is just copying a
> number of bytes out of the demixing matrix equal to the channel count,
> which doesn't make much sense.
> There are a few errors in the opus_head_projection_parse()
> implementation. For mapping family 253 you don't validate that the
> packet has at least 21 bytes before reading bytes 19 and 20. Although
> you have bumped up OP_NCHANNELS_MAX to 18,
> op_validate_ambisonic_channels() will actually succeed for channel
> counts much larger than that. I didn't see anything that would then
> prevent us from trying to decode such streams (and overflowing the
> statically-sized buffers based on OP_NCHANNELS_MAX). I think this should
> return OP_EIMPL like we do for mapping family 255. Finally, I don't want
> to use sqrt() and introduce a libm dependency for the fixed-point
> builds, so we should probably extract the ambisonics order from the
> channel count a different way.
> The next issue is that the demixing matrix cannot be stored directly in
> the OggOpusFile struct. In the case of a chained, seekable stream with
> multiple ambisonics projection links, we will parse all of the headers
> up front, and the matrix that gets stored will simply be the last one
> parsed, which will overwrite all of the others (and if they have
> different channel/stream counts, we could potentially use data from a
> mix of matrices of different sizes). The proper place to put this data
> is in the OggOpusLink structure, since there is one of these per link in
> a chained file. Since files can have a nearly unlimited number of links,
> and this matrix can be somewhat large, it probably makes sense to
> dynamically allocate it instead of using a fixed-size buffer, especially
> since most files probably won't be using ambisonics to begin with.
> Once we have dynamic allocation for the demixing matrix, I wonder if we
> shouldn't just bump OP_NCHANNELS_MAX up to 227 now. Otherwise whenever
> someone does want to produce some of these higher-order ambisonics
> streams, everyone will have to upgrade libopusfile to be able to play them.
> I also don't think you converted the check to see if we can re-use an
> existing decoder correctly. At a minimum it should ensure we want to
> create the same type of decoder and in the projection case, either
> verify the demixing matrices match or not attempt to re-use projection
> decoders (since this would require saving a copy of the demixing matrix
> to correctly handle the non-seekable case). I also think
> op_generic_decoder_init() should use op_use_projection() to decide which
> kind of decoder to create, to make sure it does not get out of sync with
> the rest of the code. At that point we're passing in enough pieces of an
> OggOpusLink that we should probably just pass the whole thing.
> I think we can also get rid of some #ifdefs if we always define
> op_use_projection(), but make it return 0 without
> Currently, it also looks like you are re-using the context variable
> decode_cb_ctx to save the user context for both the decode and
> projection_decode callbacks. So if you register both, the one you get
> passed depends on the order you register the callbacks, and gets passed
> to both callbacks. That behavior isn't documented in the API docs, and
> is a little confusing. I think the context used for each callback should
> be saved independently.
> There are a number of other minor nits about making the formatting
> consistent with the rest of the codebase and the use of consistent names
> for things (i.e., opus_projection_decode vs.
> op_decode_projection_cb_func, mapping_family vs. channel_mapping, od vs.
> st, etc.).
> Since some of the above points can be tricky (especially lifetime
> management for the dynamically allocated demixing matrices), enumerating
> all of the formatting nits is laborious, and because I wanted to keep
> things moving, I went ahead and addressed all of the above comments in a
> separate patch (attached). Please review, and if it looks okay, I'll
> commit a version squashed with your patch.
> For the purposes of the open question above, for users of the old
> opus_parse_head() API I'm currently returning success when we have a
> projection matrix, but I'm strongly leaning towards returning OP_EIMPL
> instead. Let me know what you think about raising the channel limit, also.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the opus