[opus] Coarse energy predictor confusion

Sun Jun 27 21:29:08 UTC 2021

Hi all,

I'm having trouble reconciling the coarse energy predictor's z-transform in
the paper[0]/RFC and the corresponding code in libopus 1.3.1[1]. I'm pretty
new to DSP theory and dealing with z-transforms, but I'm interested in
learning (as well as compression), so I thought I'd study this filter. But
I just can't seem to get it to match my understanding of the code; it's
likely I've made a few mistakes, and any help/guidance would be greatly
appreciated!

Note that this is a bit difficult to describe without proper typesetting,
so I've prepared some pdf notes (as well as lyx source) and attached them
to this email, as well as a pdf render. In case that doesn't reach you,
they're also available on my dropbox:
pdf:
https://www.dropbox.com/s/d3erbl9oc4r4wu7/predictor-confusion-2.pdf?dl=0
lyx:
https://www.dropbox.com/s/9lxjliqfexe9vz0/predictor-confusion-2.lyx?dl=0

Finally, if THAT doesn't work, plaintext-with-tex-mixed-in version follows.

Thanks for your time/help,
Jake

---

I'm having trouble reconciling the coarse energy predictor
implementation in the libopus source code and the 2D z-transform
description in the paper[0].

I've simplified the source code (in unquant_coarse_energy in
quant_bands.c in libopus 1.3.1[1]) to the following C-like pseudocode:

void unquant_coarse_energy(float *e, int bands) {
  float alpha = /* ... */;
  float beta = /* ... */;
  float p = 0.0f;
  for (int b = 0; b < bands; b++) {
    float q = /* read from bitstream */;
    e[i] = alpha * e[i] + p + q;
    p = p + q - beta * q;
  }
}

According to the paper, the 2D z-transform should be:

A(z_{\ell},z_{b})=(1-\alpha z_{\ell}^{-1})\cdot\frac{1-z_{b}^{-1}}{1-\beta
z_{b}^{-1}}

First off, to state what I think is obvious: the domain of this
filter should be a 2D “energy plane” with the \ell-dimension
representing frames and the b-dimension representing bands, and
the range should be the prediction (actual band energy - q[\ell,b]
, the residual). As a predictor, the filter must be causal.
Finally, according to the code above, the energy is always 0 for b<0
 (\ell<0, b\geq bands, and \ell\geq frames are not specified nor
useful).

Assuming this filter is separable, we first have the \ell
-dimension predictor:

A(z_{\ell})=1-\alpha z_{\ell}^{-1}

At first, I thought this was clearly embodied by alpha * e[i]
above. However, the z-transform implies that it should actually
be (1 - alpha) * e[i], so already we seem to be missing another e[i]
 term somewhere (not to mention alpha having the wrong sign).

The b-dimension predictor seems even more problematic:

A(z_{b})=\frac{1-z_{b}^{-1}}{1-\beta z_{b}^{-1}}

This matches what's listed in the CELT blog post[2], and is equivalent to:

Y(z_{b})=\frac{1-z_{b}^{-1}}{1-\beta z_{b}^{-1}}X(z_{b})

The equivalent difference equation is:

y[b]=x[b]-x[b-1]+\beta y[b-1]

And substituting names from the C code, we should get something
like:

prev[b]=q[b]-q[b-1]+\beta prev[b-1]

Now, it should be mentioned that I actually asked about this
recently in the DSP stack exchange[3] (after first emailing Jean-Marc Valin
directly, but I seem to
have scared him off with another wall of text similar to this
one), and a helpful user there was able to clarify many things.
We actually arrived at the same difference equation in the end,
even though we got there a bit of a different way (one which
actually included both dimensions from the original 2D z
-transform), which suggests that my analysis above is correct.

However, we still didn't figure out the last bit: reconciling it
with the C code; it appears to differ. If I forget about the
above and just read the C code, we should get:

prev[b]=prev[b-1]+q[b]-\beta q[b]

The equivalent z-transform for this difference equation would be:

A(z_{b})=\frac{1-\beta}{1-z_{b}^{-1}}

This suggests that the actual predictor description might instead
be:

A(z_{\ell},z_{b})=(1-\alpha z_{\ell}^{-1})\cdot\frac{1-\beta}{1-z_{b}^{-1}}

However, that still ignores the apparently-missing e[i] term from
the \ell-dimension.

So, what am I missing? One thing that I glossed over above that
the first predictor dimenson (\ell) appears to be applied to the
band energy directly (as expected), whereas the second predictor
dimension (b) appears to be applied to the residual q. Since q
can be expressed in terms of the energy and the predictor, I
tried several different interpretations and substitutions in
various domains in order to describe a predictor in with the 2D “
energy plane” as the domain and the prediction as the range, and
got some crazy z-transforms that don't look correct; here's a few
just for the curious:A(z_{b},z_{\ell})=\frac{1-\beta+\alpha
z_{\ell}^{-1}(1-z_{b}^{-1})}{\beta-z_{b}^{-1}}

A(z_{b},z_{\ell})=\frac{1+\beta z_{b}^{-1}-\alpha
z_{\ell}^{-1}(1-z_{b}^{-1})}{(1+\beta)z_{b}^{-1}}

So, at this point I'm kindof running in circles, and I think I
may have done something wrong; at least I'd like to think that's
a lot more likely than the paper/RFC/libopus code were out of
sync somehow!

[0]: https://arxiv.org/abs/1602.04845
[1]: https://opus-codec.org/release/stable/2019/04/12/libopus-1_3_1.html
[2]: https://jmvalin.dreamwidth.org/12000.html
[3]:
https://dsp.stackexchange.com/questions/75972/having-trouble-interpreting-z-transform-description-of-a-predictor-from-a-codec
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xiph.org/pipermail/opus/attachments/20210627/a4d92acf/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: predictor-confusion-2.pdf
Type: application/pdf
Size: 81142 bytes
Desc: not available
URL: <http://lists.xiph.org/pipermail/opus/attachments/20210627/a4d92acf/attachment-0001.pdf>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: predictor-confusion-2.lyx
Type: application/lyx
Size: 10572 bytes
Desc: not available
URL: <http://lists.xiph.org/pipermail/opus/attachments/20210627/a4d92acf/attachment-0001.bin>