[opus] [PATCH] opus-tools/opusfile: Support for Ambisonics
Timothy B. Terriberry
tterribe at xiph.org
Fri May 25 23:43:16 UTC 2018
Drew Allen wrote:
> Friendly ping for supporting ambisonics in opus-tools & opusfile
>
> Please LMK any ?s you might have.
Sorry for the long delay on the review. The high-level design looks a
lot better. I have a number of comments on the implementation details.
I didn't really like the way that opus_head_projection_parse() turned
out. I was originally thinking something more generic/future proof that
could also be used for any other channel mapping families that want to
define a custom channel mapping table. The way this is implemented in
the patch you sent, it duplicates a lot of code from opus_head_parse(),
the user still needs to call opus_head_parse() (so the work gets
duplicated at runtime, too), and they have to pass in a buffer of
exactly the right size (which means they need to know some details of
the specific mapping family and how the demixing matrix works).
I think a slightly better approach is along the lines of the
opus_head_parse_ext() I originally suggested, but just return a pointer
into the user-supplied packet instead of copying the data. For the
simple case where the user is just going to pass the data immediately to
opus_projection_decoder_create(), they don't need a separate buffer at
all, and the parameters to return the pointer and the matrix size can be
NULLable to make them optional, so we can implement opus_head_parse()
simply by calling opus_head_parse_ext() and passing NULL for them
(eliminating all of the duplicate code and work).
Open question: Should opus_head_parse_ext() with NULL for the matrix
and/or matrix size parameters succeed, or return an error (OP_EIMPL?) to
let the application know that there is missing data they're not getting
by using the old API (i.e., so they know they won't be able to decode
that stream correctly)? If we do continue to return success, we should
probably fill in the existing mapping[] array in OpusHead with something
meaningful (e.g., 255 for all silence). Right now it is just copying a
number of bytes out of the demixing matrix equal to the channel count,
which doesn't make much sense.
There are a few errors in the opus_head_projection_parse()
implementation. For mapping family 253 you don't validate that the
packet has at least 21 bytes before reading bytes 19 and 20. Although
you have bumped up OP_NCHANNELS_MAX to 18,
op_validate_ambisonic_channels() will actually succeed for channel
counts much larger than that. I didn't see anything that would then
prevent us from trying to decode such streams (and overflowing the
statically-sized buffers based on OP_NCHANNELS_MAX). I think this should
return OP_EIMPL like we do for mapping family 255. Finally, I don't want
to use sqrt() and introduce a libm dependency for the fixed-point
builds, so we should probably extract the ambisonics order from the
channel count a different way.
The next issue is that the demixing matrix cannot be stored directly in
the OggOpusFile struct. In the case of a chained, seekable stream with
multiple ambisonics projection links, we will parse all of the headers
up front, and the matrix that gets stored will simply be the last one
parsed, which will overwrite all of the others (and if they have
different channel/stream counts, we could potentially use data from a
mix of matrices of different sizes). The proper place to put this data
is in the OggOpusLink structure, since there is one of these per link in
a chained file. Since files can have a nearly unlimited number of links,
and this matrix can be somewhat large, it probably makes sense to
dynamically allocate it instead of using a fixed-size buffer, especially
since most files probably won't be using ambisonics to begin with.
Once we have dynamic allocation for the demixing matrix, I wonder if we
shouldn't just bump OP_NCHANNELS_MAX up to 227 now. Otherwise whenever
someone does want to produce some of these higher-order ambisonics
streams, everyone will have to upgrade libopusfile to be able to play them.
I also don't think you converted the check to see if we can re-use an
existing decoder correctly. At a minimum it should ensure we want to
create the same type of decoder and in the projection case, either
verify the demixing matrices match or not attempt to re-use projection
decoders (since this would require saving a copy of the demixing matrix
to correctly handle the non-seekable case). I also think
op_generic_decoder_init() should use op_use_projection() to decide which
kind of decoder to create, to make sure it does not get out of sync with
the rest of the code. At that point we're passing in enough pieces of an
OggOpusLink that we should probably just pass the whole thing.
I think we can also get rid of some #ifdefs if we always define
op_use_projection(), but make it return 0 without
OPUS_HAVE_OPUS_PROJECTION_H.
Currently, it also looks like you are re-using the context variable
decode_cb_ctx to save the user context for both the decode and
projection_decode callbacks. So if you register both, the one you get
passed depends on the order you register the callbacks, and gets passed
to both callbacks. That behavior isn't documented in the API docs, and
is a little confusing. I think the context used for each callback should
be saved independently.
There are a number of other minor nits about making the formatting
consistent with the rest of the codebase and the use of consistent names
for things (i.e., opus_projection_decode vs.
op_decode_projection_cb_func, mapping_family vs. channel_mapping, od vs.
st, etc.).
Since some of the above points can be tricky (especially lifetime
management for the dynamically allocated demixing matrices), enumerating
all of the formatting nits is laborious, and because I wanted to keep
things moving, I went ahead and addressed all of the above comments in a
separate patch (attached). Please review, and if it looks okay, I'll
commit a version squashed with your patch.
For the purposes of the open question above, for users of the old
opus_parse_head() API I'm currently returning success when we have a
projection matrix, but I'm strongly leaning towards returning OP_EIMPL
instead. Let me know what you think about raising the channel limit, also.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Review-comments-for-ambisonics-support.patch
Type: text/x-patch
Size: 51777 bytes
Desc: not available
URL: <http://lists.xiph.org/pipermail/opus/attachments/20180525/b8ef08f4/attachment-0001.bin>
More information about the opus
mailing list