[daala] Why not upgrading Daala range coding with much faster rANS or tANS?

Sat Aug 15 01:36:33 PDT 2015

> Sure: <https://tools.ietf.org/html/draft-terriberry-netvc-codingtools-00#section-2>.
> The interesting bits for what you are after probably start in Section 2.3.

Thank you, I have looked at the materials but I didn't see the details, 
so I had to get to the code.
For a better focus, here is example of the entire rANS decoding step 
with renormalization:

s = symbol[x & mask];
x = freq[s] * (x >> n) + (x & mask) - cumul[s];
if(x < 2^16) x = x<<16 | read16bits();     // renormalization

I have looked at entdec.c, the od_ec_decode_cdf function (?):

int od_ec_decode_cdf_(od_ec_dec *dec, const uint16_t *cdf, int nsyms OD_ACC_STR) {
  261   od_ec_window dif;
  262   unsigned r;
  263   unsigned c;
  264   unsigned d;
  265 #if OD_EC_REDUCED_OVERHEAD
  266   unsigned e;
  267 #endif
  268   int s;
  269   unsigned u;
  270   unsigned v;
  271   unsigned q;
  272   unsigned fl;
  273   unsigned fh;
  274   unsigned ft;
  275   int ret;
  276   dif = dec->dif;
  277   r = dec->rng;
  278   OD_ASSERT(dif >> (OD_EC_WINDOW_SIZE - 16) < r);
  279   OD_ASSERT(nsyms > 0);
  280   ft = cdf[nsyms - 1];
  281   OD_ASSERT(16384 <= ft);
  282   OD_ASSERT(ft <= 32768U);
  283   OD_ASSERT(ft <= r);
  284   s = r - ft >= ft;
  285   ft <<= s;
  286   d = r - ft;
  287   OD_ASSERT(d < ft);
  288   c = (unsigned)(dif >> (OD_EC_WINDOW_SIZE - 16));
  289   q = OD_MAXI((int)(c >> 1), (int)(c - d));
  290 #if OD_EC_REDUCED_OVERHEAD
  291   e = OD_SUBSATU(2*d, ft);
  292   /*The correctness of this inverse partition function is not obvious, but it
  293      was checked exhaustively for all possible values of r, ft, and c.
  294     TODO: It should be possible to optimize this better than the compiler,
  295      given that we do not care about the accuracy of negative results (as we
  296      will not use them).
  297     It would also be nice to get rid of the 32-bit dividend, as it requires a
  298      32x32->64 bit multiply to invert.*/
  299   q = OD_MAXI((int)q, (int)((2*(int32_t)c + 1 - (int32_t)e)/3));
  300 #endif
  301   q >>= s;
  302   OD_ASSERT(q < ft >> s);
  303   fl = 0;
  304   ret = 0;
  305   for (fh = cdf[ret]; fh <= q; fh = cdf[++ret]) fl = fh;
  306   OD_ASSERT(fh <= ft >> s);
  307   fl <<= s;
  308   fh <<= s;
  309 #if OD_EC_REDUCED_OVERHEAD
  310   u = fl + OD_MINI(fl, e) + OD_MINI(OD_SUBSATU(fl, e) >> 1, d);
  311   v = fh + OD_MINI(fh, e) + OD_MINI(OD_SUBSATU(fh, e) >> 1, d);
  312 #else
  313   u = fl + OD_MINI(fl, d);
  314   v = fh + OD_MINI(fh, d);
  315 #endif
  316   r = v - u;
  317   dif -= (od_ec_window)u << (OD_EC_WINDOW_SIZE - 16);
  318   return od_ec_dec_normalize(dec, dif, r, ret, acc_str);
  319 }

Indeed I don't see multiplication by frequencies here.
If I properly understood,
OD_MAXI, OD_MINI is maximum, minimum of two variables
OD_SUBSATU(a,b) = a - min(a,b)
OD_ASSERT is only information for compiler
dif is the start of the range (?)
r is its length
ft gives the proportions to focus inside the range

I have to admit that I don't see how it finds the new range, lines 280 - 
291 seem to should do it (?), what is below seem to be renormalization 
related (?)
Could you give me some hints to understand it?

> That wouldn't really tell us much useful, because of course what's in
> that benchmark is not how an entropy coder is used in a video codec. The
> relevant test would be to put your alternative into Daala.

It seems you have a general purpose entropy coder, which could be also put in different applications, so maybe it's worth benchmarking alone?

> CABAC is heavily patented and thus not relevant

I only gave it as en example how ANS makes everything simpler/cheaper.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.xiph.org/pipermail/daala/attachments/20150815/50f5a29a/attachment.htm