<div dir="ltr"><div>Hi again all,</div><div><br></div><div>I've made some progress with these notes since the last email. Particularly, I misunderstood how 2D (separable) filters should be cascaded, and that this is captured in the original paper's z-transform. I also discovered that I could arrive at the correct 2D z-transform from the C code if I accept that the range of the predictor is the <i>residual</i>, not the energy minus the residual, as I expected. This seems a little strange to me, and is the last missing link in the puzzle. Any tips/help/explanations would be appreciated to clear this up!</div><div><br></div><div>Once again, notes in pdf/lyx format are attached, as well as on my dropbox:</div><div>pdf: <a href="https://www.dropbox.com/s/6a3hw33w1zul7w3/predictor-confusion-3.pdf?dl=0">https://www.dropbox.com/s/6a3hw33w1zul7w3/predictor-confusion-3.pdf?dl=0</a></div><div>lyx: <a href="https://www.dropbox.com/s/e57duheaiyb7qy4/predictor-confusion-3.lyx?dl=0">https://www.dropbox.com/s/e57duheaiyb7qy4/predictor-confusion-3.lyx?dl=0</a></div><div><br></div><div>and plaintext below.</div><div><br></div><div>Thanks again,</div><div>Jake</div><div><br></div><div>---</div><div><br></div><div>Opus Coarse Energy Predictor Notes (Round 3)<br><br>Jake Taylor (<a href="mailto:yupferris@gmail.com">yupferris@gmail.com</a>), 29 June 2021<br><br>I'm having trouble reconciling the coarse energy predictor <br>implementation in the libopus source code and the 2D z-transform <br>description in the paper[0]. At this point, I've gotten quite close, but am still having <br>trouble with the last bits. And help/guidance would be <br>appreciated!<br><br>I've simplified the source code (in unquant_coarse_energy in <br>quant_bands.c in libopus 1.3.1[1]) to the following C (pseudo)code:<br><br>void unquant_coarse_energy(float *e, int bands) {<br> float alpha = /* ... */;<br> float beta = /* ... */;<br> float prev = 0.0f;<br> for (int b = 0; b < bands; b++) {<br> float r = /* read from bitstream */;<br> float q = alpha * e[i] + prev;<br> e[i] = q + r;<br> prev = prev + (1 - beta) * r;<br> }<br>}<br><br>According to the paper, the 2D z-transform should be:<br><br>A(z_{\ell},z_{b})=(1-\alpha z_{\ell}^{-1})\cdot\frac{1-z_{b}^{-1}}{1-\beta z_{b}^{-1}}<br><br>First, to state what I think is obvious: the domain of this <br>filter should be a 2D “energy plane” with the \ell-dimension <br>representing frames and the b-dimension representing bands, and <br>the range should be the prediction (actual band energy minus r[\ell,b]<br>, the residual). As a predictor, the filter must be causal. <br>Finally, according to the code above, the energy is always 0 for b<0<br> (b\geq bands, \ell<0, and \ell\geq frames are neither specified <br>nor useful).<br><br>As outlined in the CELT blog post[2], this z-transform describes two (separable) cascaded <br>predictors:<br><br>A(z_{\ell},z_{b})=P(z_{\ell})Q(z_{b})<br><br>The first is the \ell-dimension predictor whose domain is indeed <br>the 2D energy plane (albeit a single “row” of it corresponding to <br>a particular band in isolation over multiple frames):<br><br>P(z_{\ell})=1-\alpha z_{\ell}^{-1}<br><br>If we write the range explicitly:<br><br>P(z_{\ell})=(1-\alpha z_{\ell}^{-1})E(z_{\ell})<br><br>This corresponds to the following difference equation:<br><br>p[\ell]=e[\ell]-\alpha e[\ell-1]<br><br>The range of this predictor is not the final prediction, but an <br>intermediate p. Note that this equation includes the current <br>band's energy e[\ell], which is somewhat surprising for a <br>predictor, as e[\ell] will not be known until it can be derived <br>after r is decoded in the decoder.<br><br>For each \ell (“columns” of the energy plane), successive <br>elements of p are then fed into the b-dimension predictor, whose <br>range is the final prediction:<br><br>Q(z_{b})=\frac{1-z_{b}^{-1}}{1-\beta z_{b}^{-1}}<br><br>The equivalent difference equation is:<br><br>q[b]=p[b]-p[b-1]+\beta q[b-1]<br><br>What remains now is to match these cascaded predictors with the C <br>code, which is not trivial because it's not immediately apparent <br>that the above derived difference equations are present in the <br>code. There are likely several ways to do this; two high-level <br>approaches stand out to me: either “lower” the expected z<br>–transforms to difference equations and refactor until we get <br>matching code, or “lift” the code to z-transforms and refactor <br>until those match.<br><br>I find it easier to manipulate terms in the z-domain, so let's <br>try the latter approach. We'll first describe prev in terms of <br>the known signals. Looking at the C code, we know the following:<br><br>q[\ell,b]=\alpha e[\ell-1,b]+prev[\ell,b]<br><br>Since q is the output of the predictor, we can rewrite it in <br>terms of e and r:<br><br>q[\ell,b]=e[\ell,b]-r[\ell,b]<br><br>This is also given by the C code. Substituting this in the <br>previous equation yields:<br><br>prev[\ell,b]=e[\ell,b]-\alpha e[\ell-1,b]-r[\ell,b]<br><br>Or, equivalently:<br><br>prev[\ell,b]=p[\ell,b]-r[\ell,b]<br><br>Trivially, the corresponding z-transform is:<br><br>Prev(z_{\ell},z_{b})=P(z_{\ell},z_{b})-R(z_{\ell},z_{b})<br><br>So, prev apparently represents the difference between the output <br>of the \ell-dimension predictor p and the residual r. <br>Interestingly, q, the expected range of the b-dimension <br>predictor, has completely disappeared! This is odd, as without <br>this term, it's going to be difficult to reach a z-transform with <br>the correct range. However, ignoring this for now, we can still <br>make some kind of progress.<br><br>The difference equation governing prev is the following (again, <br>from the C code):<br><br>prev[\ell,b+1]=prev[\ell,b]+(1-\beta)r[\ell,b]<br><br>Note that the output of the difference equation is indexed with b+1<br> because it represents the value of prev after it has been <br>updated for this loop iteration. This is actually an important <br>distinction, because the code that modifies prev uses the r[\ell,b]<br>, i.e. the current residual, and these two signals should have <br>the correct phase with respect to one another. We can also <br>express this as:<br><br>prev[\ell,b]=prev[\ell,b-1]+(1-\beta)r[\ell,b-1]<br><br>These two definitions are, of course, equivalent. The z-transform <br>of this difference equation is:<br><br>Prev(z_{\ell},z_{b})=z_{b}^{-1}Prev(z_{\ell},z_{b})+(1-\beta)z_{b}^{-1}R(z_{\ell},z_{b})<br><br>The final step involves substituting occurrences of prev with <br>equivalent expressions in terms of P and R:<br><br>P(z_{\ell},z_{b})-R(z_{\ell},z_{b})=z_{b}^{-1}(P(z_{\ell},z_{b})-R(z_{\ell},z_{b}))+(1-\beta)z_{b}^{-1}R(z_{\ell},z_{b})<br><br>This can be simplified to:<br><br>\frac{R(z_{\ell},z_{b})}{P(z_{\ell},z_{b})}=\frac{1-z_{b}^{-1}}{1-\beta z_{b}^{-1}}<br><br>While this filter has almost the expected definition with the <br>correct domain p, its range is r, not q! This is somewhat <br>unsurprising, as the q terms cancelled out of our equations <br>above. However, I find it really strange that the output of the <br>predictor is actually the residual, and not the prediction <br>itself. One would think that, because the residual is not yet <br>known to the decoder until dequantization, the predictor would <br>represent the resulting energy minus that residual. I suppose I <br>could be happy accepting that, but it would still be nice to get <br>a few more pairs of eyes on this to make sure I haven't made any <br>mistakes, and perhaps to provide additional insight on why the <br>predictor is factored this way.</div><div><br></div><div>[0]
<a href="https://arxiv.org/abs/1602.0484">https://arxiv.org/abs/1602.0484</a><br></div><div>[1]
<a href="https://opus-codec.org/release/stable/2019/04/12/libopus-1_3_1.html">https://opus-codec.org/release/stable/2019/04/12/libopus-1_3_1.html</a> <br></div><div>[2]
<a href="https://jmvalin.dreamwidth.org/12000.html">https://jmvalin.dreamwidth.org/12000.html</a>
</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sun, Jun 27, 2021 at 11:29 PM Jake Taylor <<a href="mailto:yupferris@gmail.com" target="_blank">yupferris@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>Hi all,</div><div><br></div><div>I'm having trouble reconciling the coarse energy predictor's z-transform in the paper[0]/RFC and the corresponding code in libopus 1.3.1[1]. I'm pretty new to DSP theory and dealing with z-transforms, but I'm interested in learning (as well as compression), so I thought I'd study this filter. But I just can't seem to get it to match my understanding of the code; it's likely I've made a few mistakes, and any help/guidance would be greatly appreciated!<br></div><div><br></div><div>Note that this is a bit difficult to describe without proper typesetting, so I've prepared some pdf notes (as well as lyx source) and attached them to this email, as well as a pdf render. In case that doesn't reach you, they're also available on my dropbox:</div><div>pdf: <a href="https://www.dropbox.com/s/d3erbl9oc4r4wu7/predictor-confusion-2.pdf?dl=0" target="_blank">https://www.dropbox.com/s/d3erbl9oc4r4wu7/predictor-confusion-2.pdf?dl=0</a></div><div>lyx: <a href="https://www.dropbox.com/s/9lxjliqfexe9vz0/predictor-confusion-2.lyx?dl=0" target="_blank">https://www.dropbox.com/s/9lxjliqfexe9vz0/predictor-confusion-2.lyx?dl=0</a></div><div><br></div><div>Finally, if THAT doesn't work, plaintext-with-tex-mixed-in version follows.</div><div><br></div><div>Thanks for your time/help,</div><div>Jake<br></div><div><br></div><div>---</div><div><br></div><div>I'm having trouble reconciling the coarse energy predictor <br>implementation in the libopus source code and the 2D z-transform <br>description in the paper[0].<br><br>I've simplified the source code (in unquant_coarse_energy in <br>quant_bands.c in libopus 1.3.1[1]) to the following C-like pseudocode:<br><br>void unquant_coarse_energy(float *e, int bands) {<br> float alpha = /* ... */;<br> float beta = /* ... */;<br> float p = 0.0f;<br> for (int b = 0; b < bands; b++) {<br> float q = /* read from bitstream */;<br> e[i] = alpha * e[i] + p + q;<br> p = p + q - beta * q;<br> }<br>}<br><br>According to the paper, the 2D z-transform should be:<br><br>A(z_{\ell},z_{b})=(1-\alpha z_{\ell}^{-1})\cdot\frac{1-z_{b}^{-1}}{1-\beta z_{b}^{-1}}<br><br>First off, to state what I think is obvious: the domain of this <br>filter should be a 2D “energy plane” with the \ell-dimension <br>representing frames and the b-dimension representing bands, and <br>the range should be the prediction (actual band energy - q[\ell,b]<br>, the residual). As a predictor, the filter must be causal. <br>Finally, according to the code above, the energy is always 0 for b<0<br> (\ell<0, b\geq bands, and \ell\geq frames are not specified nor <br>useful).<br><br>Assuming this filter is separable, we first have the \ell<br>-dimension predictor:<br><br>A(z_{\ell})=1-\alpha z_{\ell}^{-1}<br><br>At first, I thought this was clearly embodied by alpha * e[i] <br>above. However, the z-transform implies that it should actually <br>be (1 - alpha) * e[i], so already we seem to be missing another e[i]<br> term somewhere (not to mention alpha having the wrong sign).<br><br>The b-dimension predictor seems even more problematic:<br><br>A(z_{b})=\frac{1-z_{b}^{-1}}{1-\beta z_{b}^{-1}}<br><br>This matches what's listed in the CELT blog post[2], and is equivalent to:<br><br>Y(z_{b})=\frac{1-z_{b}^{-1}}{1-\beta z_{b}^{-1}}X(z_{b})<br><br>The equivalent difference equation is:<br><br>y[b]=x[b]-x[b-1]+\beta y[b-1]<br><br>And substituting names from the C code, we should get something <br>like:<br><br>prev[b]=q[b]-q[b-1]+\beta prev[b-1]<br><br>Now, it should be mentioned that I actually asked about this <br>recently in the DSP stack exchange[3] (after first emailing Jean-Marc Valin directly, but I seem to <br>have scared him off with another wall of text similar to this <br>one), and a helpful user there was able to clarify many things. <br>We actually arrived at the same difference equation in the end, <br>even though we got there a bit of a different way (one which <br>actually included both dimensions from the original 2D z<br>-transform), which suggests that my analysis above is correct.<br><br>However, we still didn't figure out the last bit: reconciling it <br>with the C code; it appears to differ. If I forget about the <br>above and just read the C code, we should get:<br><br>prev[b]=prev[b-1]+q[b]-\beta q[b]<br><br>The equivalent z-transform for this difference equation would be:<br><br>A(z_{b})=\frac{1-\beta}{1-z_{b}^{-1}}<br><br>This suggests that the actual predictor description might instead <br>be:<br><br>A(z_{\ell},z_{b})=(1-\alpha z_{\ell}^{-1})\cdot\frac{1-\beta}{1-z_{b}^{-1}}<br><br>However, that still ignores the apparently-missing e[i] term from <br>the \ell-dimension.<br><br>So, what am I missing? One thing that I glossed over above that <br>the first predictor dimenson (\ell) appears to be applied to the <br>band energy directly (as expected), whereas the second predictor <br>dimension (b) appears to be applied to the residual q. Since q <br>can be expressed in terms of the energy and the predictor, I <br>tried several different interpretations and substitutions in <br>various domains in order to describe a predictor in with the 2D “<br>energy plane” as the domain and the prediction as the range, and <br>got some crazy z-transforms that don't look correct; here's a few <br>just for the curious:A(z_{b},z_{\ell})=\frac{1-\beta+\alpha z_{\ell}^{-1}(1-z_{b}^{-1})}{\beta-z_{b}^{-1}}<br><br>A(z_{b},z_{\ell})=\frac{1+\beta z_{b}^{-1}-\alpha z_{\ell}^{-1}(1-z_{b}^{-1})}{(1+\beta)z_{b}^{-1}}<br><br>So, at this point I'm kindof running in circles, and I think I <br>may have done something wrong; at least I'd like to think that's <br>a lot more likely than the paper/RFC/libopus code were out of <br>sync somehow!</div><div><br></div><div>[0]: <a href="https://arxiv.org/abs/1602.04845" target="_blank">https://arxiv.org/abs/1602.04845</a></div><div>[1]:
<a href="https://opus-codec.org/release/stable/2019/04/12/libopus-1_3_1.html" target="_blank">https://opus-codec.org/release/stable/2019/04/12/libopus-1_3_1.html</a></div><div>[2]:
<a href="https://jmvalin.dreamwidth.org/12000.html" target="_blank">https://jmvalin.dreamwidth.org/12000.html</a></div><div>[3]:
<a href="https://dsp.stackexchange.com/questions/75972/having-trouble-interpreting-z-transform-description-of-a-predictor-from-a-codec" target="_blank">https://dsp.stackexchange.com/questions/75972/having-trouble-interpreting-z-transform-description-of-a-predictor-from-a-codec</a>
</div></div>
</blockquote></div>