[opus] Coarse energy predictor confusion

Tue Jun 29 17:32:31 UTC 2021

Hi again all,

I've made some progress with these notes since the last email.
Particularly, I misunderstood how 2D (separable) filters should be
cascaded, and that this is captured in the original paper's z-transform. I
also discovered that I could arrive at the correct 2D z-transform from the
C code if I accept that the range of the predictor is the *residual*, not
the energy minus the residual, as I expected. This seems a little strange
to me, and is the last missing link in the puzzle. Any
tips/help/explanations would be appreciated to clear this up!

Once again, notes in pdf/lyx format are attached, as well as on my dropbox:
pdf:
https://www.dropbox.com/s/6a3hw33w1zul7w3/predictor-confusion-3.pdf?dl=0
lyx:
https://www.dropbox.com/s/e57duheaiyb7qy4/predictor-confusion-3.lyx?dl=0

and plaintext below.

Thanks again,
Jake

---

Opus Coarse Energy Predictor Notes (Round 3)

Jake Taylor (yupferris at gmail.com), 29 June 2021

I'm having trouble reconciling the coarse energy predictor
implementation in the libopus source code and the 2D z-transform
description in the paper[0]. At this point, I've gotten quite close, but am
still having
trouble with the last bits. And help/guidance would be
appreciated!

I've simplified the source code (in unquant_coarse_energy in
quant_bands.c in libopus 1.3.1[1]) to the following C (pseudo)code:

void unquant_coarse_energy(float *e, int bands) {
  float alpha = /* ... */;
  float beta = /* ... */;
  float prev = 0.0f;
  for (int b = 0; b < bands; b++) {
    float r = /* read from bitstream */;
    float q = alpha * e[i] + prev;
    e[i] = q + r;
    prev = prev + (1 - beta) * r;
  }
}

According to the paper, the 2D z-transform should be:

A(z_{\ell},z_{b})=(1-\alpha z_{\ell}^{-1})\cdot\frac{1-z_{b}^{-1}}{1-\beta
z_{b}^{-1}}

First, to state what I think is obvious: the domain of this
filter should be a 2D “energy plane” with the \ell-dimension
representing frames and the b-dimension representing bands, and
the range should be the prediction (actual band energy minus r[\ell,b]
, the residual). As a predictor, the filter must be causal.
Finally, according to the code above, the energy is always 0 for b<0
 (b\geq bands, \ell<0, and \ell\geq frames are neither specified
nor useful).

As outlined in the CELT blog post[2], this z-transform describes two
(separable) cascaded
predictors:

A(z_{\ell},z_{b})=P(z_{\ell})Q(z_{b})

The first is the \ell-dimension predictor whose domain is indeed
the 2D energy plane (albeit a single “row” of it corresponding to
a particular band in isolation over multiple frames):

P(z_{\ell})=1-\alpha z_{\ell}^{-1}

If we write the range explicitly:

P(z_{\ell})=(1-\alpha z_{\ell}^{-1})E(z_{\ell})

This corresponds to the following difference equation:

p[\ell]=e[\ell]-\alpha e[\ell-1]

The range of this predictor is not the final prediction, but an
intermediate p. Note that this equation includes the current
band's energy e[\ell], which is somewhat surprising for a
predictor, as e[\ell] will not be known until it can be derived
after r is decoded in the decoder.

For each \ell (“columns” of the energy plane), successive
elements of p are then fed into the b-dimension predictor, whose
range is the final prediction:

Q(z_{b})=\frac{1-z_{b}^{-1}}{1-\beta z_{b}^{-1}}

The equivalent difference equation is:

q[b]=p[b]-p[b-1]+\beta q[b-1]

What remains now is to match these cascaded predictors with the C
code, which is not trivial because it's not immediately apparent
that the above derived difference equations are present in the
code. There are likely several ways to do this; two high-level
approaches stand out to me: either “lower” the expected z
–transforms to difference equations and refactor until we get
matching code, or “lift” the code to z-transforms and refactor
until those match.

I find it easier to manipulate terms in the z-domain, so let's
try the latter approach. We'll first describe prev in terms of
the known signals. Looking at the C code, we know the following:

q[\ell,b]=\alpha e[\ell-1,b]+prev[\ell,b]

Since q is the output of the predictor, we can rewrite it in
terms of e and r:

q[\ell,b]=e[\ell,b]-r[\ell,b]

This is also given by the C code. Substituting this in the
previous equation yields:

prev[\ell,b]=e[\ell,b]-\alpha e[\ell-1,b]-r[\ell,b]

Or, equivalently:

prev[\ell,b]=p[\ell,b]-r[\ell,b]

Trivially, the corresponding z-transform is:

Prev(z_{\ell},z_{b})=P(z_{\ell},z_{b})-R(z_{\ell},z_{b})

So, prev apparently represents the difference between the output
of the \ell-dimension predictor p and the residual r.
Interestingly, q, the expected range of the b-dimension
predictor, has completely disappeared! This is odd, as without
this term, it's going to be difficult to reach a z-transform with
the correct range. However, ignoring this for now, we can still
make some kind of progress.

The difference equation governing prev is the following (again,
from the C code):

prev[\ell,b+1]=prev[\ell,b]+(1-\beta)r[\ell,b]

Note that the output of the difference equation is indexed with b+1
 because it represents the value of prev after it has been
updated for this loop iteration. This is actually an important
distinction, because the code that modifies prev uses the r[\ell,b]
, i.e. the current residual, and these two signals should have
the correct phase with respect to one another. We can also
express this as:

prev[\ell,b]=prev[\ell,b-1]+(1-\beta)r[\ell,b-1]

These two definitions are, of course, equivalent. The z-transform
of this difference equation is:

Prev(z_{\ell},z_{b})=z_{b}^{-1}Prev(z_{\ell},z_{b})+(1-\beta)z_{b}^{-1}R(z_{\ell},z_{b})

The final step involves substituting occurrences of prev with
equivalent expressions in terms of P and R:

P(z_{\ell},z_{b})-R(z_{\ell},z_{b})=z_{b}^{-1}(P(z_{\ell},z_{b})-R(z_{\ell},z_{b}))+(1-\beta)z_{b}^{-1}R(z_{\ell},z_{b})

This can be simplified to:

\frac{R(z_{\ell},z_{b})}{P(z_{\ell},z_{b})}=\frac{1-z_{b}^{-1}}{1-\beta
z_{b}^{-1}}

While this filter has almost the expected definition with the
correct domain p, its range is r, not q! This is somewhat
unsurprising, as the q terms cancelled out of our equations
above. However, I find it really strange that the output of the
predictor is actually the residual, and not the prediction
itself. One would think that, because the residual is not yet
known to the decoder until dequantization, the predictor would
represent the resulting energy minus that residual. I suppose I
could be happy accepting that, but it would still be nice to get
a few more pairs of eyes on this to make sure I haven't made any
mistakes, and perhaps to provide additional insight on why the
predictor is factored this way.

[0] https://arxiv.org/abs/1602.0484
[1] https://opus-codec.org/release/stable/2019/04/12/libopus-1_3_1.html
[2] https://jmvalin.dreamwidth.org/12000.html

On Sun, Jun 27, 2021 at 11:29 PM Jake Taylor <yupferris at gmail.com> wrote:

> Hi all,
>
> I'm having trouble reconciling the coarse energy predictor's z-transform
> in the paper[0]/RFC and the corresponding code in libopus 1.3.1[1]. I'm
> pretty new to DSP theory and dealing with z-transforms, but I'm interested
> in learning (as well as compression), so I thought I'd study this filter.
> But I just can't seem to get it to match my understanding of the code; it's
> likely I've made a few mistakes, and any help/guidance would be greatly
> appreciated!
>
> Note that this is a bit difficult to describe without proper typesetting,
> so I've prepared some pdf notes (as well as lyx source) and attached them
> to this email, as well as a pdf render. In case that doesn't reach you,
> they're also available on my dropbox:
> pdf:
> https://www.dropbox.com/s/d3erbl9oc4r4wu7/predictor-confusion-2.pdf?dl=0
> lyx:
> https://www.dropbox.com/s/9lxjliqfexe9vz0/predictor-confusion-2.lyx?dl=0
>
> Finally, if THAT doesn't work, plaintext-with-tex-mixed-in version follows.
>
> Thanks for your time/help,
> Jake
>
> ---
>
> I'm having trouble reconciling the coarse energy predictor
> implementation in the libopus source code and the 2D z-transform
> description in the paper[0].
>
> I've simplified the source code (in unquant_coarse_energy in
> quant_bands.c in libopus 1.3.1[1]) to the following C-like pseudocode:
>
> void unquant_coarse_energy(float *e, int bands) {
>   float alpha = /* ... */;
>   float beta = /* ... */;
>   float p = 0.0f;
>   for (int b = 0; b < bands; b++) {
>     float q = /* read from bitstream */;
>     e[i] = alpha * e[i] + p + q;
>     p = p + q - beta * q;
>   }
> }
>
> According to the paper, the 2D z-transform should be:
>
> A(z_{\ell},z_{b})=(1-\alpha z_{\ell}^{-1})\cdot\frac{1-z_{b}^{-1}}{1-\beta
> z_{b}^{-1}}
>
> First off, to state what I think is obvious: the domain of this
> filter should be a 2D “energy plane” with the \ell-dimension
> representing frames and the b-dimension representing bands, and
> the range should be the prediction (actual band energy - q[\ell,b]
> , the residual). As a predictor, the filter must be causal.
> Finally, according to the code above, the energy is always 0 for b<0
>  (\ell<0, b\geq bands, and \ell\geq frames are not specified nor
> useful).
>
> Assuming this filter is separable, we first have the \ell
> -dimension predictor:
>
> A(z_{\ell})=1-\alpha z_{\ell}^{-1}
>
> At first, I thought this was clearly embodied by alpha * e[i]
> above. However, the z-transform implies that it should actually
> be (1 - alpha) * e[i], so already we seem to be missing another e[i]
>  term somewhere (not to mention alpha having the wrong sign).
>
> The b-dimension predictor seems even more problematic:
>
> A(z_{b})=\frac{1-z_{b}^{-1}}{1-\beta z_{b}^{-1}}
>
> This matches what's listed in the CELT blog post[2], and is equivalent to:
>
> Y(z_{b})=\frac{1-z_{b}^{-1}}{1-\beta z_{b}^{-1}}X(z_{b})
>
> The equivalent difference equation is:
>
> y[b]=x[b]-x[b-1]+\beta y[b-1]
>
> And substituting names from the C code, we should get something
> like:
>
> prev[b]=q[b]-q[b-1]+\beta prev[b-1]
>
> Now, it should be mentioned that I actually asked about this
> recently in the DSP stack exchange[3] (after first emailing Jean-Marc
> Valin directly, but I seem to
> have scared him off with another wall of text similar to this
> one), and a helpful user there was able to clarify many things.
> We actually arrived at the same difference equation in the end,
> even though we got there a bit of a different way (one which
> actually included both dimensions from the original 2D z
> -transform), which suggests that my analysis above is correct.
>
> However, we still didn't figure out the last bit: reconciling it
> with the C code; it appears to differ. If I forget about the
> above and just read the C code, we should get:
>
> prev[b]=prev[b-1]+q[b]-\beta q[b]
>
> The equivalent z-transform for this difference equation would be:
>
> A(z_{b})=\frac{1-\beta}{1-z_{b}^{-1}}
>
> This suggests that the actual predictor description might instead
> be:
>
> A(z_{\ell},z_{b})=(1-\alpha z_{\ell}^{-1})\cdot\frac{1-\beta}{1-z_{b}^{-1}}
>
> However, that still ignores the apparently-missing e[i] term from
> the \ell-dimension.
>
> So, what am I missing? One thing that I glossed over above that
> the first predictor dimenson (\ell) appears to be applied to the
> band energy directly (as expected), whereas the second predictor
> dimension (b) appears to be applied to the residual q. Since q
> can be expressed in terms of the energy and the predictor, I
> tried several different interpretations and substitutions in
> various domains in order to describe a predictor in with the 2D “
> energy plane” as the domain and the prediction as the range, and
> got some crazy z-transforms that don't look correct; here's a few
> just for the curious:A(z_{b},z_{\ell})=\frac{1-\beta+\alpha
> z_{\ell}^{-1}(1-z_{b}^{-1})}{\beta-z_{b}^{-1}}
>
> A(z_{b},z_{\ell})=\frac{1+\beta z_{b}^{-1}-\alpha
> z_{\ell}^{-1}(1-z_{b}^{-1})}{(1+\beta)z_{b}^{-1}}
>
> So, at this point I'm kindof running in circles, and I think I
> may have done something wrong; at least I'd like to think that's
> a lot more likely than the paper/RFC/libopus code were out of
> sync somehow!
>
> [0]: https://arxiv.org/abs/1602.04845
> [1]: https://opus-codec.org/release/stable/2019/04/12/libopus-1_3_1.html
> [2]: https://jmvalin.dreamwidth.org/12000.html
> [3]:
> https://dsp.stackexchange.com/questions/75972/having-trouble-interpreting-z-transform-description-of-a-predictor-from-a-codec
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xiph.org/pipermail/opus/attachments/20210629/325bf6b6/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: predictor-confusion-3.lyx
Type: application/lyx
Size: 16363 bytes
Desc: not available
URL: <http://lists.xiph.org/pipermail/opus/attachments/20210629/325bf6b6/attachment-0001.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: predictor-confusion-3.pdf
Type: application/pdf
Size: 90881 bytes
Desc: not available
URL: <http://lists.xiph.org/pipermail/opus/attachments/20210629/325bf6b6/attachment-0001.pdf>