[opus] Coarse energy predictor confusion

Thu Jul 1 10:08:03 UTC 2021

Hi again all,

I've made a bit more progress on this now, and I believe with some
certainty that the 2D z-transform for the coarse energy predictor listed in
the paper[0]/RFC[1] does *not* actually match what the libopus code is
doing. It would be good to get some extra eyes on my math to (dis)prove
this.

Once again, notes in pdf/lyx format are attached, as well as on my dropbox:
pdf:
https://www.dropbox.com/s/l4tzyvq1mne8jan/predictor-confusion-4.pdf?dl=0
lyx:
https://www.dropbox.com/s/map3slkbnhyctui/predictor-confusion-4.lyx?dl=0

and plaintext below.

Thanks again,
Jake

---

Opus Coarse Energy Predictor Notes (Round 4)

Jake Taylor (yupferris at gmail.com), 1 July 2021

I'm having trouble reconciling the 2D coarse energy predictor's z
-transform description in the paper[0]/RFC[1] and the implementation
in the libopus source code. At this point, I believe the z-transform
does not actually describe the behavior of the filter in the code.

According to the paper/RFC, the 2D z-transform should be:

A(z_{\ell},z_{b})=(1-\alpha z_{\ell}^{-1})\cdot\frac{1-z_{b}^{-1}}{1-\beta
z_{b}^{-1}}

I've simplified the source code (in unquant_coarse_energy in
quant_bands.c in libopus 1.3.1[2]) to the following C (pseudo)code:

void unquant_coarse_energy(float *e, int bands) {
  float alpha = /* ... */;
  float beta = /* ... */;
  float prev = 0.0f;
  for (int b = 0; b < bands; b++) {
    float r = /* read from bitstream */;
    e[i] = alpha * e[i] + prev + r;
    prev = prev + (1 - beta) * r;
  }
}

The goal is simple: extract difference equations from the code,
derive the corresponding 1D z-transforms, and then simplify/merge
into a single 2D z-transform which should hopefully match what
the paper/RFC list.

The code is governed by the following difference equations:

e[\ell,b]=\alpha e[\ell-1,b]+prev[\ell,b-1]+r[\ell,b]

prev[\ell,b]=prev[\ell,b-1]+(1-\beta)r[\ell,b]

The corresponding z-transforms are:

E(z_{\ell},z_{b})=\alpha
z_{\ell}^{-1}E(z_{\ell},z_{b})+z_{b}^{-1}Prev(z_{\ell},z_{b})+R(z_{\ell},z_{b})

Prev(z_{\ell},z_{b})=z_{b}^{-1}Prev(z_{\ell},z_{b})+(1-\beta)R(z_{\ell},z_{b})

We expect the domain of the predictor to be e (the 2D “energy
plane”) and the range of the predictor q to be:

q[\ell,b]=e[\ell,b]-r[\ell,b]

Equivalently, in the z-domain:

Q(z_{\ell},z_{b})=E(z_{\ell},z_{b})-R(z_{\ell},z_{b})

These three z-transforms should be all we need to reach the 2D z
-transform from the paper/RFC.

First, let's eliminate the R terms using the definition for Q:

R(z_{\ell},z_{b})=E(z_{\ell},z_{b})-Q(z_{\ell},z_{b})

This leads us to:

E(z_{\ell},z_{b})=\alpha
z_{\ell}^{-1}E(z_{\ell},z_{b})+z_{b}^{-1}Prev(z_{\ell},z_{b})+E(z_{\ell},z_{b})-Q(z_{\ell},z_{b})

Prev(z_{\ell},z_{b})=z_{b}^{-1}Prev(z_{\ell},z_{b})+(1-\beta)(E(z_{\ell},z_{b})-Q(z_{\ell},z_{b}))

Let's simplify each individually a bit:

-\alpha
z_{\ell}^{-1}E(z_{\ell},z_{b})=z_{b}^{-1}Prev(z_{\ell},z_{b})-Q(z_{\ell},z_{b})

Prev(z_{\ell},z_{b})=\frac{1-\beta}{1-z_{b}^{-1}}(E(z_{\ell},z_{b})-Q(z_{\ell},z_{b}))

>From here, we can substitute the Prev term in the former z
-transform:

-\alpha
z_{\ell}^{-1}E(z_{\ell},z_{b})=z_{b}^{-1}(\frac{1-\beta}{1-z_{b}^{-1}}(E(z_{\ell},z_{b})-Q(z_{\ell},z_{b})))-Q(z_{\ell},z_{b})

And simplifying that yields:

Q(z_{\ell},z_{b})=\frac{\alpha
z_{\ell}^{-1}(1-z_{b}^{-1})+z_{b}^{-1}(1-\beta)}{1-\beta
z_{b}^{-1}}E(z_{\ell},z_{b})

Q(z_{\ell},z_{b})=\frac{\alpha
z_{\ell}^{-1}(1-z_{b}^{-1})+z_{b}^{-1}(1-\beta)}{1-\beta
z_{b}^{-1}}E(z_{\ell},z_{b})

It's difficult to factor this in a better way; even using Wolfram
Alpha[3]. But, barring mathematical mistakes I might have made, one
thing is clear: this is not the same as what's listed in the
paper/RFC.

[0]: https://arxiv.org/abs/1602.04845
[1]: https://datatracker.ietf.org/doc/html/rfc6716#section-4.3.2
[2]: https://opus-codec.org/release/stable/2019/04/12/libopus-1_3_1.html
[3]: https://www.wolframalpha.com/input/?i=alpha*l%281-b%29%2Bb%281-beta%29

On Tue, Jun 29, 2021 at 7:32 PM Jake Taylor <yupferris at gmail.com> wrote:

> Hi again all,
>
> I've made some progress with these notes since the last email.
> Particularly, I misunderstood how 2D (separable) filters should be
> cascaded, and that this is captured in the original paper's z-transform. I
> also discovered that I could arrive at the correct 2D z-transform from the
> C code if I accept that the range of the predictor is the *residual*, not
> the energy minus the residual, as I expected. This seems a little strange
> to me, and is the last missing link in the puzzle. Any
> tips/help/explanations would be appreciated to clear this up!
>
> Once again, notes in pdf/lyx format are attached, as well as on my dropbox:
> pdf:
> https://www.dropbox.com/s/6a3hw33w1zul7w3/predictor-confusion-3.pdf?dl=0
> lyx:
> https://www.dropbox.com/s/e57duheaiyb7qy4/predictor-confusion-3.lyx?dl=0
>
> and plaintext below.
>
> Thanks again,
> Jake
>
> ---
>
> Opus Coarse Energy Predictor Notes (Round 3)
>
> Jake Taylor (yupferris at gmail.com), 29 June 2021
>
> I'm having trouble reconciling the coarse energy predictor
> implementation in the libopus source code and the 2D z-transform
> description in the paper[0]. At this point, I've gotten quite close, but
> am still having
> trouble with the last bits. And help/guidance would be
> appreciated!
>
> I've simplified the source code (in unquant_coarse_energy in
> quant_bands.c in libopus 1.3.1[1]) to the following C (pseudo)code:
>
> void unquant_coarse_energy(float *e, int bands) {
>   float alpha = /* ... */;
>   float beta = /* ... */;
>   float prev = 0.0f;
>   for (int b = 0; b < bands; b++) {
>     float r = /* read from bitstream */;
>     float q = alpha * e[i] + prev;
>     e[i] = q + r;
>     prev = prev + (1 - beta) * r;
>   }
> }
>
> According to the paper, the 2D z-transform should be:
>
> A(z_{\ell},z_{b})=(1-\alpha z_{\ell}^{-1})\cdot\frac{1-z_{b}^{-1}}{1-\beta
> z_{b}^{-1}}
>
> First, to state what I think is obvious: the domain of this
> filter should be a 2D “energy plane” with the \ell-dimension
> representing frames and the b-dimension representing bands, and
> the range should be the prediction (actual band energy minus r[\ell,b]
> , the residual). As a predictor, the filter must be causal.
> Finally, according to the code above, the energy is always 0 for b<0
>  (b\geq bands, \ell<0, and \ell\geq frames are neither specified
> nor useful).
>
> As outlined in the CELT blog post[2], this z-transform describes two
> (separable) cascaded
> predictors:
>
> A(z_{\ell},z_{b})=P(z_{\ell})Q(z_{b})
>
> The first is the \ell-dimension predictor whose domain is indeed
> the 2D energy plane (albeit a single “row” of it corresponding to
> a particular band in isolation over multiple frames):
>
> P(z_{\ell})=1-\alpha z_{\ell}^{-1}
>
> If we write the range explicitly:
>
> P(z_{\ell})=(1-\alpha z_{\ell}^{-1})E(z_{\ell})
>
> This corresponds to the following difference equation:
>
> p[\ell]=e[\ell]-\alpha e[\ell-1]
>
> The range of this predictor is not the final prediction, but an
> intermediate p. Note that this equation includes the current
> band's energy e[\ell], which is somewhat surprising for a
> predictor, as e[\ell] will not be known until it can be derived
> after r is decoded in the decoder.
>
> For each \ell (“columns” of the energy plane), successive
> elements of p are then fed into the b-dimension predictor, whose
> range is the final prediction:
>
> Q(z_{b})=\frac{1-z_{b}^{-1}}{1-\beta z_{b}^{-1}}
>
> The equivalent difference equation is:
>
> q[b]=p[b]-p[b-1]+\beta q[b-1]
>
> What remains now is to match these cascaded predictors with the C
> code, which is not trivial because it's not immediately apparent
> that the above derived difference equations are present in the
> code. There are likely several ways to do this; two high-level
> approaches stand out to me: either “lower” the expected z
> –transforms to difference equations and refactor until we get
> matching code, or “lift” the code to z-transforms and refactor
> until those match.
>
> I find it easier to manipulate terms in the z-domain, so let's
> try the latter approach. We'll first describe prev in terms of
> the known signals. Looking at the C code, we know the following:
>
> q[\ell,b]=\alpha e[\ell-1,b]+prev[\ell,b]
>
> Since q is the output of the predictor, we can rewrite it in
> terms of e and r:
>
> q[\ell,b]=e[\ell,b]-r[\ell,b]
>
> This is also given by the C code. Substituting this in the
> previous equation yields:
>
> prev[\ell,b]=e[\ell,b]-\alpha e[\ell-1,b]-r[\ell,b]
>
> Or, equivalently:
>
> prev[\ell,b]=p[\ell,b]-r[\ell,b]
>
> Trivially, the corresponding z-transform is:
>
> Prev(z_{\ell},z_{b})=P(z_{\ell},z_{b})-R(z_{\ell},z_{b})
>
> So, prev apparently represents the difference between the output
> of the \ell-dimension predictor p and the residual r.
> Interestingly, q, the expected range of the b-dimension
> predictor, has completely disappeared! This is odd, as without
> this term, it's going to be difficult to reach a z-transform with
> the correct range. However, ignoring this for now, we can still
> make some kind of progress.
>
> The difference equation governing prev is the following (again,
> from the C code):
>
> prev[\ell,b+1]=prev[\ell,b]+(1-\beta)r[\ell,b]
>
> Note that the output of the difference equation is indexed with b+1
>  because it represents the value of prev after it has been
> updated for this loop iteration. This is actually an important
> distinction, because the code that modifies prev uses the r[\ell,b]
> , i.e. the current residual, and these two signals should have
> the correct phase with respect to one another. We can also
> express this as:
>
> prev[\ell,b]=prev[\ell,b-1]+(1-\beta)r[\ell,b-1]
>
> These two definitions are, of course, equivalent. The z-transform
> of this difference equation is:
>
>
> Prev(z_{\ell},z_{b})=z_{b}^{-1}Prev(z_{\ell},z_{b})+(1-\beta)z_{b}^{-1}R(z_{\ell},z_{b})
>
> The final step involves substituting occurrences of prev with
> equivalent expressions in terms of P and R:
>
>
> P(z_{\ell},z_{b})-R(z_{\ell},z_{b})=z_{b}^{-1}(P(z_{\ell},z_{b})-R(z_{\ell},z_{b}))+(1-\beta)z_{b}^{-1}R(z_{\ell},z_{b})
>
> This can be simplified to:
>
> \frac{R(z_{\ell},z_{b})}{P(z_{\ell},z_{b})}=\frac{1-z_{b}^{-1}}{1-\beta
> z_{b}^{-1}}
>
> While this filter has almost the expected definition with the
> correct domain p, its range is r, not q! This is somewhat
> unsurprising, as the q terms cancelled out of our equations
> above. However, I find it really strange that the output of the
> predictor is actually the residual, and not the prediction
> itself. One would think that, because the residual is not yet
> known to the decoder until dequantization, the predictor would
> represent the resulting energy minus that residual. I suppose I
> could be happy accepting that, but it would still be nice to get
> a few more pairs of eyes on this to make sure I haven't made any
> mistakes, and perhaps to provide additional insight on why the
> predictor is factored this way.
>
> [0] https://arxiv.org/abs/1602.0484
> [1] https://opus-codec.org/release/stable/2019/04/12/libopus-1_3_1.html
> [2] https://jmvalin.dreamwidth.org/12000.html
>
> On Sun, Jun 27, 2021 at 11:29 PM Jake Taylor <yupferris at gmail.com> wrote:
>
>> Hi all,
>>
>> I'm having trouble reconciling the coarse energy predictor's z-transform
>> in the paper[0]/RFC and the corresponding code in libopus 1.3.1[1]. I'm
>> pretty new to DSP theory and dealing with z-transforms, but I'm interested
>> in learning (as well as compression), so I thought I'd study this filter.
>> But I just can't seem to get it to match my understanding of the code; it's
>> likely I've made a few mistakes, and any help/guidance would be greatly
>> appreciated!
>>
>> Note that this is a bit difficult to describe without proper typesetting,
>> so I've prepared some pdf notes (as well as lyx source) and attached them
>> to this email, as well as a pdf render. In case that doesn't reach you,
>> they're also available on my dropbox:
>> pdf:
>> https://www.dropbox.com/s/d3erbl9oc4r4wu7/predictor-confusion-2.pdf?dl=0
>> lyx:
>> https://www.dropbox.com/s/9lxjliqfexe9vz0/predictor-confusion-2.lyx?dl=0
>>
>> Finally, if THAT doesn't work, plaintext-with-tex-mixed-in version
>> follows.
>>
>> Thanks for your time/help,
>> Jake
>>
>> ---
>>
>> I'm having trouble reconciling the coarse energy predictor
>> implementation in the libopus source code and the 2D z-transform
>> description in the paper[0].
>>
>> I've simplified the source code (in unquant_coarse_energy in
>> quant_bands.c in libopus 1.3.1[1]) to the following C-like pseudocode:
>>
>> void unquant_coarse_energy(float *e, int bands) {
>>   float alpha = /* ... */;
>>   float beta = /* ... */;
>>   float p = 0.0f;
>>   for (int b = 0; b < bands; b++) {
>>     float q = /* read from bitstream */;
>>     e[i] = alpha * e[i] + p + q;
>>     p = p + q - beta * q;
>>   }
>> }
>>
>> According to the paper, the 2D z-transform should be:
>>
>> A(z_{\ell},z_{b})=(1-\alpha
>> z_{\ell}^{-1})\cdot\frac{1-z_{b}^{-1}}{1-\beta z_{b}^{-1}}
>>
>> First off, to state what I think is obvious: the domain of this
>> filter should be a 2D “energy plane” with the \ell-dimension
>> representing frames and the b-dimension representing bands, and
>> the range should be the prediction (actual band energy - q[\ell,b]
>> , the residual). As a predictor, the filter must be causal.
>> Finally, according to the code above, the energy is always 0 for b<0
>>  (\ell<0, b\geq bands, and \ell\geq frames are not specified nor
>> useful).
>>
>> Assuming this filter is separable, we first have the \ell
>> -dimension predictor:
>>
>> A(z_{\ell})=1-\alpha z_{\ell}^{-1}
>>
>> At first, I thought this was clearly embodied by alpha * e[i]
>> above. However, the z-transform implies that it should actually
>> be (1 - alpha) * e[i], so already we seem to be missing another e[i]
>>  term somewhere (not to mention alpha having the wrong sign).
>>
>> The b-dimension predictor seems even more problematic:
>>
>> A(z_{b})=\frac{1-z_{b}^{-1}}{1-\beta z_{b}^{-1}}
>>
>> This matches what's listed in the CELT blog post[2], and is equivalent to:
>>
>> Y(z_{b})=\frac{1-z_{b}^{-1}}{1-\beta z_{b}^{-1}}X(z_{b})
>>
>> The equivalent difference equation is:
>>
>> y[b]=x[b]-x[b-1]+\beta y[b-1]
>>
>> And substituting names from the C code, we should get something
>> like:
>>
>> prev[b]=q[b]-q[b-1]+\beta prev[b-1]
>>
>> Now, it should be mentioned that I actually asked about this
>> recently in the DSP stack exchange[3] (after first emailing Jean-Marc
>> Valin directly, but I seem to
>> have scared him off with another wall of text similar to this
>> one), and a helpful user there was able to clarify many things.
>> We actually arrived at the same difference equation in the end,
>> even though we got there a bit of a different way (one which
>> actually included both dimensions from the original 2D z
>> -transform), which suggests that my analysis above is correct.
>>
>> However, we still didn't figure out the last bit: reconciling it
>> with the C code; it appears to differ. If I forget about the
>> above and just read the C code, we should get:
>>
>> prev[b]=prev[b-1]+q[b]-\beta q[b]
>>
>> The equivalent z-transform for this difference equation would be:
>>
>> A(z_{b})=\frac{1-\beta}{1-z_{b}^{-1}}
>>
>> This suggests that the actual predictor description might instead
>> be:
>>
>> A(z_{\ell},z_{b})=(1-\alpha
>> z_{\ell}^{-1})\cdot\frac{1-\beta}{1-z_{b}^{-1}}
>>
>> However, that still ignores the apparently-missing e[i] term from
>> the \ell-dimension.
>>
>> So, what am I missing? One thing that I glossed over above that
>> the first predictor dimenson (\ell) appears to be applied to the
>> band energy directly (as expected), whereas the second predictor
>> dimension (b) appears to be applied to the residual q. Since q
>> can be expressed in terms of the energy and the predictor, I
>> tried several different interpretations and substitutions in
>> various domains in order to describe a predictor in with the 2D “
>> energy plane” as the domain and the prediction as the range, and
>> got some crazy z-transforms that don't look correct; here's a few
>> just for the curious:A(z_{b},z_{\ell})=\frac{1-\beta+\alpha
>> z_{\ell}^{-1}(1-z_{b}^{-1})}{\beta-z_{b}^{-1}}
>>
>> A(z_{b},z_{\ell})=\frac{1+\beta z_{b}^{-1}-\alpha
>> z_{\ell}^{-1}(1-z_{b}^{-1})}{(1+\beta)z_{b}^{-1}}
>>
>> So, at this point I'm kindof running in circles, and I think I
>> may have done something wrong; at least I'd like to think that's
>> a lot more likely than the paper/RFC/libopus code were out of
>> sync somehow!
>>
>> [0]: https://arxiv.org/abs/1602.04845
>> [1]: https://opus-codec.org/release/stable/2019/04/12/libopus-1_3_1.html
>> [2]: https://jmvalin.dreamwidth.org/12000.html
>> [3]:
>> https://dsp.stackexchange.com/questions/75972/having-trouble-interpreting-z-transform-description-of-a-predictor-from-a-codec
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xiph.org/pipermail/opus/attachments/20210701/235c71a6/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: predictor-confusion-4.lyx
Type: application/octet-stream
Size: 8510 bytes
Desc: not available
URL: <http://lists.xiph.org/pipermail/opus/attachments/20210701/235c71a6/attachment-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: predictor-confusion-4.pdf
Type: application/pdf
Size: 76873 bytes
Desc: not available
URL: <http://lists.xiph.org/pipermail/opus/attachments/20210701/235c71a6/attachment-0001.pdf>