[daala] Curious about progress of codec

Wed Apr 27 13:14:45 UTC 2016

Regarding starting every frame with flat distribution, I believe you 
still use non-I-frames: which require some I-frame for decoding.
So maybe at least start with the probability distribution from this 
recent required I-frame?
Another option is varying rate - start with e.g. rate=2,3 for some 
number of symbols to quickly get out of the terrible flat distribution, 
then rise it to e.g. 5 for more subtle adaptation to local situation.

However, still the looking best approach here is to choose separate 
optimal initial distribution for each ID - just average over sample 
videos and fix these probabilities in the codec standard.
Also, it seems beneficial to separately optimize 'rate' for each ID and 
again fix it in the standard.

A separate suggestion is to take a look at sequences from your data - 
here is example from single frame of you sample for ID=0:
"00114044010001001404441000010010000000000000000000000100000000000000000000000000000000000000000000000000"
It clearly suggests we have two very different behaviors here - it would 
be beneficial to split it into at least 2 separate IDs.
A general approach for this complex problem of choosing the set of IDs, 
is starting with generating a huge number of pre-ID, then perform (e.g. 
hierarchical) clustering - groping similar pre-IDs into final IDs.

Good to hear that you have not only switched to accurate entropy coder, 
but also to exponential forgetting, which is much better at adapting to 
local situation. Also you don't longer need the costly non-power-of-two 
denominator.
For power-of-two denominator, standard range coder needs two 
multiplications per symbol, rANS only one, has simpler multi-bit 
renormalization and allows for further optimizations - leading to ~3x 
faster/cheaper (software) decoding (also reducing frequency of hardware 
decoder) - here are optimized implementations: 
https://github.com/jkbonfield/rans_static
The inconvenience is indeed that encoder needs a buffer for backward 
encoding within a data block (e.g. a frame or its part) - additional a 
few kilobyte buffer in costly video encoder seems negligible (?).

Cheers,
Jarek

On 16/04/25 06:58, Jean-Marc Valin wrote:
> On 04/24/2016 10:02 PM, Jarek Duda wrote:
>> 1) First of all, I don't understand why you insist flat initial
>> probability for every frame, not I-frame as you still need it for P,B
>> frames?
> As much as I would like to have the code adapt its probabilities from
> frame to frame, it is just not a viable option for any application (e.g.
> videoconference) that is based on unreliable transport. In that case,
> losing a single packet means your stream is completely undecodable until
> the next keyframe, which is unacceptable. We may the feature as an
> option for applications that do have reliable transport, but we just
> can't rely on it everywhere.
>
>> Writing entire initial probability distribution every time doesn't seem
>> reasonable.
> Indeed, that's why we never write initial probabilities.
>
>> Instead, you can use a parametric distribution (e.g. geometric) and just
>> write the (quantized) parameter.
> We have used parametric distributions for some things, but so far in all
> cases I've been able to beat the parametric distribution by using an
> adaptive one (even with flat initialization). Don't get me wrong -- I'd
> much rather use something parametric with tables stored in ROM. But so
> far it's always been a bit of a loss over adapting.
>
>> Or don't write it down, only fix the initial probability for given ID in
>> the standard (averaged) - especially to emphasize the fact that your '0'
>> symbol has often dominating probability.
> This is something we have considered, but it's a lot of work. Right now
> we're only doing it for motion vectors (initializing with a high
> probability of zero), but doing it for all symbols would require
> recomputing the initial probabilities on every change to the bit-stream.
> My estimate is that if we were to include an initial probability, we
> could save in the order of 1-2% in Daala, which is something you might
> want to do just before freezing a bit-stream, but not during development.
>
>> Your current main purpose of expensive symbol-wise adaptation is to
>> quickly get out of this terrible flat distribution.
>> In contrast, its main purpose should be to adapt to local variations of
>> density - which should remain close to the initial density.
> What we have now actually does both. The best way to consider it is that
> at the beginning you just have an average of what you saw so far, and
> eventually it starts "forgetting" and behaving like what you suggest.
>
>> 2) Good to hear that you are handling your entropy coding problem.
>> If you have an unknown accurate multi-symbol entropy coder, it would be
>> beneficial to share with the society and compare with others, e.g.:
>> http://encode.ru/threads/1920-In-memory-open-source-benchmark-of-entropy-coders?p=45168&viewfull=1#post45168
> I don't think what we have for power-of-two probabilities is by any
> means ground-breaking or anything. We are basically scaling the
> probabilities by the range, i.e. using a multiplication. When it comes
> to non-powers of two, it's a bit more of a hack and currently requires
> two 8-bit multiplications.
>
>> 3) Again your count[i]/totalcount adaptation is good for quickly getting
>> out of the terrible flat distribution.
>> For the actual reason why adaptation is usually used: adapting to local
>> variation of density, it is crucial to use exponential forgetting - make
>> that local behavior has larger influence than some far history - what is
>> much weaker in your adaptation.
> We *are* using exponential forgetting right now.
>
>> Regarding the accuracy issue of this cheap SIMD-able exponential
>> adaptation:
>>
>> for (int i = 1; i < m; i++) CDF[i] -= (CDF[i] - mixCDF[i]) >> rate;
>>
>> there is none if properly choosing mixCDF to make that not used symbol
>> lead to the lowest possible nonzero frequency (can be tabled as
>> mixCDF[s,i]):
>>
>> CDF[0] = 0
>> CDF[m] = 2^n     // denominator,  n=14 or 15 for  16bit rANS
>> renormalization
>> rate >= 1            // the larger, the slower adaptation
>> s = 0 .. m-1       // symbols, Pr(s) ~ (CDF[s+1] - CDF[s])/2^n
>> step = 2^n - m + 2^rate -1     // so that mixCDF[m-1] = 2^n-1
>>
>> for (int i=1; i<m; i++) mixCDF[i] = i - (1 << rate);        // to lead
>> to minimal: CDF[s+1] - CDF[s] = 1
>> for (int i = s+1; i < m; i++) mixCDF[i] += step;          // CDF for
>> single symbol s, jump from s+1
>>
>> You can test it with this Mathematica notebook:
>> https://dl.dropboxusercontent.com/u/12405967/adapt.nb
> Sorry, I don't have Mathematica, I use Octave and C! In any case, I
> think you have an off-by-one error in your mixCDF[] but I've been able
> to work out the math and as far as I can tell, it's mathematically
> equivalent(*) to the what we did in
> https://github.com/xiph/daala/commit/3cd606c24af
> but moving all the floor computation in mixCDF[], which safes one add.
>
> (*) The only difference is the (1 << rate) offset, which I think should
> be (1 << rate) -1 and avoids a large floor on the first symbol
> (something I did not have).
>
>> 4) My suggestion was to allow encoder to choose among a few rates, e.g.
>> use 2 bits to choose from {3,4,5,6}.
>> Encoder can cheaply test all of them (sum of log(1/p)) and choose the
>> one leading to the smallest number of required bits - the differences
>> are usually much larger than 2 bits to choose the rate.
> It's not that simple. You want your encoder to make RDO decisions based
> on what it'll cost to code a particular symbol in a particular context.
> Either you re-encode (with different RDO decisions) for each rate, or
> else you have to live with suboptimal RDO. Also, in general it's nice to
> be able to encode symbols as you go rather than having to buffer the
> entire list of symbols and encode everything at the end. It doesn't make
> a difference with rANS since you already have to do that, but the Daala
> entropy coder allows you to write the bit-stream as you encode it.
>
> Cheers,
>
> 	Jean-Marc
>

-- 
dr Jarosław Duda
Institute of Computer Science and Computer Mathematics,
Jagiellonian University, Cracow, Poland
http://th.if.uj.edu.pl/~dudaj/