[daala] Why not upgrading Daala range coding with much faster rANS or tANS?

Jarek Duda dudajar at gmail.com
Fri Aug 14 15:34:00 PDT 2015

> What we have is even more flexible
> (conveniently handles non-power-of-two frequency counts), has no
> multiplies at all, and can also take advantage of SIMD.

You are saying that you have multiplication-free large alphabet accurate 
entropy coder (and it is not tANS)?

Sounds very interesting - I was looking at your materials like slides, 
but found only range coding and mentioning CABAC - could you point me to 
some details?
How fast is its encoding/decoding?
Maybe you could put it in some benchmarks, like 

As I see it, sure we could put some behavior of range coding into a 
table like divide[R][s]={L',R'}, where R is the current range length, s 
is symbol, but
- it would be limited to a small range (poor accuracy) and number of 
symbols (not true for ANS),
- you still need to add quite nontrivial renormalization (ANS 
renormalization is trivial, like 4 byte "if(x < 2^32) x = x<<32 | 
read32bits()" for rANS),
- modifying frequencies for adaptation would be terribly costly (for 
rANS it is really cheap, see e.g. LZNA).

Regarding mentioned CABAC - here is its optimized decoding step from 

R_LPS = RTAB[m][(R > > 6) & 3];
R_MPS = R - R_LPS;
if (V < (R_MPS < < BitsLeft))
{R=R_MPS; value=valMPS;
Rnorm = (R_MPS > > 8) XOR 1;}
{V = V – (R_MPS < < BitsLeft)
R = R_LPS; value = !valMPS
Rnorm = RnormTAB[R_LPS > > 3];}
R = R << Rnorm;
BitsLeft = BitsLeft - Rnorm;
if (BitsLeft < = 0)
{V = (V < < M_BITS) | read_bits(M_BITS);
BitsLeft = BitsLeft + M_BITS;}

in contrast, here is analogous decoding step for tABS (binary tANS):

t = decodingTable[m][x];
value = valMPS XOR t.symbol;
x = t.newX + read_bits(t.nbBits);

with included renormalization, m is quantized probability of LPS.
For large alphabet tANS decoding step is nearly the same:

t = decodingTable[x];
symbol = t.symbol;
x = t.newX + read_bits(t.nbBits);

More information about the daala mailing list