[daala] Curious about progress of codec

Mon Apr 25 02:02:33 UTC 2016

Hi Jean-Marc,

1) First of all, I don't understand why you insist flat initial 
probability for every frame, not I-frame as you still need it for P,B 
frames?
Writing entire initial probability distribution every time doesn't seem 
reasonable.
Instead, you can use a parametric distribution (e.g. geometric) and just 
write the (quantized) parameter.
Or don't write it down, only fix the initial probability for given ID in 
the standard (averaged) - especially to emphasize the fact that your '0' 
symbol has often dominating probability.

Your current main purpose of expensive symbol-wise adaptation is to 
quickly get out of this terrible flat distribution.
In contrast, its main purpose should be to adapt to local variations of 
density - which should remain close to the initial density.
If the density variations turns out huge, what was happening in your 
data, it suggests to try to split them into multiple separate contexts.

2) Good to hear that you are handling your entropy coding problem.
If you have an unknown accurate multi-symbol entropy coder, it would be 
beneficial to share with the society and compare with others, e.g.:
http://encode.ru/threads/1920-In-memory-open-source-benchmark-of-entropy-coders?p=45168&viewfull=1#post45168
Regarding multiplication, you also need it for DCT.
It would be sufficient for rANS to use single uint16 multiplication per 
symbol, e.g. 10bit precision, 4bit renormalization.
In this case there would be sufficient 6bit x 10bit = 16bit multiplication.

3) Again your count[i]/totalcount adaptation is good for quickly getting 
out of the terrible flat distribution.
For the actual reason why adaptation is usually used: adapting to local 
variation of density, it is crucial to use exponential forgetting - make 
that local behavior has larger influence than some far history - what is 
much weaker in your adaptation.

Regarding the accuracy issue of this cheap SIMD-able exponential adaptation:

for (int i = 1; i < m; i++) CDF[i] -= (CDF[i] - mixCDF[i]) >> rate;

there is none if properly choosing mixCDF to make that not used symbol 
lead to the lowest possible nonzero frequency (can be tabled as 
mixCDF[s,i]):

CDF[0] = 0
CDF[m] = 2^n     // denominator,  n=14 or 15 for  16bit rANS 
renormalization
rate >= 1            // the larger, the slower adaptation
s = 0 .. m-1       // symbols, Pr(s) ~ (CDF[s+1] - CDF[s])/2^n
step = 2^n - m + 2^rate -1     // so that mixCDF[m-1] = 2^n-1

for (int i=1; i<m; i++) mixCDF[i] = i - (1 << rate);        // to lead 
to minimal: CDF[s+1] - CDF[s] = 1
for (int i = s+1; i < m; i++) mixCDF[i] += step;          // CDF for 
single symbol s, jump from s+1

You can test it with this Mathematica notebook: 
https://dl.dropboxusercontent.com/u/12405967/adapt.nb

4) My suggestion was to allow encoder to choose among a few rates, e.g. 
use 2 bits to choose from {3,4,5,6}.
Encoder can cheaply test all of them (sum of log(1/p)) and choose the 
one leading to the smallest number of required bits - the differences 
are usually much larger than 2 bits to choose the rate.
And generally various IDs have different needs - e.g. your motion 
vectors had nearly static distributions - you can also choose optimized 
rate for each separate ID.

Cheers,
Jarek

On 16/04/24 19:21, Jean-Marc Valin wrote:
> On 04/24/2016 10:43 AM, Jarek Duda wrote:
>> 1) Start with a probability distribution characteristic for a given ID,
>> for example a fixed parametric, eventually modified somewhere in the
>> file (better behaves in the beginning and allows for more subtle
>> adaptation),
> Actually, all attempts at doing this so far have failed to produce
> improvement over the current "horrible" flat initialization. So I
> actually measured how much we lose to that flat initialization in one of
> the worst cases you can have: a binary symbol where one value has
> probability 1 and the other has probability 0. In that case, despite
> initializing with a flat CDF, the total cost of coding N symbols is only
> 1 + 1/4*log2(N). So it costs (e.g.) 3 bits to code 256 symbols.
> Considering that it avoids having to code an initial probability, that
> overhead is pretty reasonable.
>
>> 2) use an accurate entropy coder, like rANS in VP10,
> You'll be happy to learn that we recently merged a more accurate entropy
> coder for power-of-two denominators:
> https://github.com/xiph/daala/commit/39debe04
> The accuracy is around the same as rANS and VP10, i.e. less than 0.1%
> overhead. The improvement on Daala is currently small since we mostly
> use non-power-of-two denominators.
> We also have found a way to have an accurate coder for non-power-of-two
> denominators, but it costs an extra multiply.
>
>> 3) use adaptation with exponential forgetting to make recent symbol more
>> important, like
>> for (int i = 1; i < m; i++) CDF[i] -= (CDF[i] - mixCDF[i]) >> rate;
>> where mixCDF is for the new part, can be tabled for symbol-wise
>> adaptation such that frequencies of not used symbol will drop to the
>> minimal nonzero frequency,
> Well, the issue with that was the high probability floor and the
> alternative you mentioned earlier (storing the LSBs separately) is a bit
> expensive. Fortunately, we've come up with a better adaptation that does
> not suffer from the floor problem and is still simple. The code just got
> merged:
> https://github.com/xiph/daala/commit/3cd606c24af
> The idea is pretty simple and very similar to your code above. The only
> difference is that the CDF we adapt has a denominator equal to:
> (32768-number_of_symbols) so that when we add a floor of one, we're
> guaranteed to have non-zero probabilities that sum to 32768.
> Using that code, along with the power-of-two entropy coder, I was able
> to get about 0.5% improvement on Daala (not merged).
>
> That being said, right now we still haven't made up our mind between the
> dozen or so combinations of entropy coder and adaptation variants.
>
>> 4) Allow for varying adaptation rate - for example some ID use more
>> static probability distribution, for some it is beneficial to allow
>> encoder to choose one of a few possible adaptation rates.
> So far, not much came out of that when testing over multiple files and
> multiple rates.
>
> Cheers,
>
> 	Jean-Marc
>

-- 
dr Jarosław Duda
Institute of Computer Science and Computer Mathematics,
Jagiellonian University, Cracow, Poland
http://th.if.uj.edu.pl/~dudaj/