<div dir="ltr">Hey Tim,<div><br></div><div>Thank you for taking the time to revise and improve my patch. I agree to all the changes you've made here and if it suits you, I'm happy to sign off for submitting the patch to opusfile.</div><div><br></div><div>Cheers,</div><div>Drew</div></div><br><div class="gmail_quote"><div dir="ltr">On Fri, May 25, 2018 at 4:43 PM Timothy B. Terriberry <<a href="mailto:tterribe@xiph.org">tterribe@xiph.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Drew Allen wrote:<br>

> Friendly ping for supporting ambisonics in opus-tools & opusfile<br>

> <br>

> Please LMK any ?s you might have.<br>

<br>

Sorry for the long delay on the review. The high-level design looks a <br>

lot better. I have a number of comments on the implementation details.<br>

<br>

I didn't really like the way that opus_head_projection_parse() turned <br>

out. I was originally thinking something more generic/future proof that <br>

could also be used for any other channel mapping families that want to <br>

define a custom channel mapping table. The way this is implemented in <br>

the patch you sent, it duplicates a lot of code from opus_head_parse(), <br>

the user still needs to call opus_head_parse() (so the work gets <br>

duplicated at runtime, too), and they have to pass in a buffer of <br>

exactly the right size (which means they need to know some details of <br>

the specific mapping family and how the demixing matrix works).<br>

<br>

I think a slightly better approach is along the lines of the <br>

opus_head_parse_ext() I originally suggested, but just return a pointer <br>

into the user-supplied packet instead of copying the data. For the <br>

simple case where the user is just going to pass the data immediately to <br>

opus_projection_decoder_create(), they don't need a separate buffer at <br>

all, and the parameters to return the pointer and the matrix size can be <br>

NULLable to make them optional, so we can implement opus_head_parse() <br>

simply by calling opus_head_parse_ext() and passing NULL for them <br>

(eliminating all of the duplicate code and work).<br>

<br>

Open question: Should opus_head_parse_ext() with NULL for the matrix <br>

and/or matrix size parameters succeed, or return an error (OP_EIMPL?) to <br>

let the application know that there is missing data they're not getting <br>

by using the old API (i.e., so they know they won't be able to decode <br>

that stream correctly)? If we do continue to return success, we should <br>

probably fill in the existing mapping[] array in OpusHead with something <br>

meaningful (e.g., 255 for all silence). Right now it is just copying a <br>

number of bytes out of the demixing matrix equal to the channel count, <br>

which doesn't make much sense.<br>

<br>

There are a few errors in the opus_head_projection_parse() <br>

implementation. For mapping family 253 you don't validate that the <br>

packet has at least 21 bytes before reading bytes 19 and 20. Although <br>

you have bumped up OP_NCHANNELS_MAX to 18, <br>

op_validate_ambisonic_channels() will actually succeed for channel <br>

counts much larger than that. I didn't see anything that would then <br>

prevent us from trying to decode such streams (and overflowing the <br>

statically-sized buffers based on OP_NCHANNELS_MAX). I think this should <br>

return OP_EIMPL like we do for mapping family 255. Finally, I don't want <br>

to use sqrt() and introduce a libm dependency for the fixed-point <br>

builds, so we should probably extract the ambisonics order from the <br>

channel count a different way.<br>

<br>

The next issue is that the demixing matrix cannot be stored directly in <br>

the OggOpusFile struct. In the case of a chained, seekable stream with <br>

multiple ambisonics projection links, we will parse all of the headers <br>

up front, and the matrix that gets stored will simply be the last one <br>

parsed, which will overwrite all of the others (and if they have <br>

different channel/stream counts, we could potentially use data from a <br>

mix of matrices of different sizes). The proper place to put this data <br>

is in the OggOpusLink structure, since there is one of these per link in <br>

a chained file. Since files can have a nearly unlimited number of links, <br>

and this matrix can be somewhat large, it probably makes sense to <br>

dynamically allocate it instead of using a fixed-size buffer, especially <br>

since most files probably won't be using ambisonics to begin with.<br>

<br>

Once we have dynamic allocation for the demixing matrix, I wonder if we <br>

shouldn't just bump OP_NCHANNELS_MAX up to 227 now. Otherwise whenever <br>

someone does want to produce some of these higher-order ambisonics <br>

streams, everyone will have to upgrade libopusfile to be able to play them.<br>

<br>

I also don't think you converted the check to see if we can re-use an <br>

existing decoder correctly. At a minimum it should ensure we want to <br>

create the same type of decoder and in the projection case, either <br>

verify the demixing matrices match or not attempt to re-use projection <br>

decoders (since this would require saving a copy of the demixing matrix <br>

to correctly handle the non-seekable case). I also think <br>

op_generic_decoder_init() should use op_use_projection() to decide which <br>

kind of decoder to create, to make sure it does not get out of sync with <br>

the rest of the code. At that point we're passing in enough pieces of an <br>

OggOpusLink that we should probably just pass the whole thing.<br>

<br>

I think we can also get rid of some #ifdefs if we always define <br>

op_use_projection(), but make it return 0 without <br>

OPUS_HAVE_OPUS_PROJECTION_H.<br>

<br>

Currently, it also looks like you are re-using the context variable <br>

decode_cb_ctx to save the user context for both the decode and <br>

projection_decode callbacks. So if you register both, the one you get <br>

passed depends on the order you register the callbacks, and gets passed <br>

to both callbacks. That behavior isn't documented in the API docs, and <br>

is a little confusing. I think the context used for each callback should <br>

be saved independently.<br>

<br>

There are a number of other minor nits about making the formatting <br>

consistent with the rest of the codebase and the use of consistent names <br>

for things (i.e., opus_projection_decode vs. <br>

op_decode_projection_cb_func, mapping_family vs. channel_mapping, od vs. <br>

st, etc.).<br>

<br>

Since some of the above points can be tricky (especially lifetime <br>

management for the dynamically allocated demixing matrices), enumerating <br>

all of the formatting nits is laborious, and because I wanted to keep <br>

things moving, I went ahead and addressed all of the above comments in a <br>

separate patch (attached). Please review, and if it looks okay, I'll <br>

commit a version squashed with your patch.<br>

<br>

For the purposes of the open question above, for users of the old <br>

opus_parse_head() API I'm currently returning success when we have a <br>

projection matrix, but I'm strongly leaning towards returning OP_EIMPL <br>

instead. Let me know what you think about raising the channel limit, also.<br>

</blockquote></div>