<html>

  <head>

    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    Regarding using probability distribution from the last I-frame, I

    have meant remembering probability distribution at the end of this

    I-frame (not updated further) - additional buffer which is updated

    every I-frame, P-frames use it as the initial probability

    distribution.<br>

    But sure, this is memory cost (a few kilobytes) as trade-off for

    ratio.<br>

    <br>

    Regarding memory trade-off for rANS, encoder needs a buffer just to

    reverse order - you can use a general memory of encoder for this

    purpose.<br>

    The size of this buffer is given by a chosen size of data block

    (usually is fixed), the cost is 2-3 bits of loss per block -

    completely negligible for a few kilobyte buffer.<br>

    From the other side, range coding is a few times more costly for

    software decoding. From hardware perspective, multiplications also

    require energy, and increase the number of cycles per symbol - needs

    increasing frequency (energy).<br>

    <br>

    Here is entire rANS decoding step for 32bits, 16 bit renormalization

    (can be reduced to e.g. 6bit x 10bit -> 16bit multiplication with

    4bit renormalization):<br>

    <br>

    <pre>s =symbol(x & mask);         // SIMD to find s such that CDF[s] <= (x & mask) < CDF[s+1]

x = (CDF[s+1] - CDF[s]) * (x >> n) + (x & mask) - CDF[s];

if (x < 2^16) x = x << 16 + read16bits();     // renormalization

The only branch can be changed to

b = (x < 2^16); 

x = x << (y << 4) + (*stream) >> ((1 xor b) << 4);

stream += b;

</pre>

    Good to hear that you plan to finally fix optimized initial

    probability distributions and rate for separate IDs - sure, it

    should be done after deciding everything else.<br>

    <br>

    Cheers,<br>

    Jarek <br>

    <br>

    <br>

    On 16/04/27 18:19, Jean-Marc Valin wrote:<br>

    <span style="white-space: pre;">> On 04/27/2016 09:14 AM, Jarek Duda wrote:

>> Regarding starting every frame with flat distribution, I believe

>> you still use non-I-frames: which require some I-frame for

>> decoding. So maybe at least start with the probability distribution

>> from this recent required I-frame?

> 

> OK, so here's what happens in a practical videoconferencing

> situation. You're sending I-frames at (e.g.) 15 second interval, but

> 5sec after an I-frame, you lose a packet for a P-frame. It means you

> have to "conceal" (guess) what was in that packet and make up

> something plausible on the screen (obviously not perfect), then you

> decode the next P-frames and any "error" you did in the concealment

> gets spread around as the motion vectors carry them away. It's not

> pretty, but you can still figure out what's going on, and if a big

> change happens you can still see what's happening.

> 

> Now, with what you're suggesting, everything that happens after the

> loss is completely undecodable and you have no idea what's going on.

> In that case, the decoder has no choice but to completely freeze the

> image for 10 seconds until the next I-frame. That's why people don't

> rely (or at least want to have the option of not relying) on anything

> from previous frames for entropy decoding.

> 

>> Another option is varying rate - start with e.g. rate=2,3 for some 

>> number of symbols to quickly get out of the terrible flat

>> distribution, then rise it to e.g. 5 for more subtle adaptation to

>> local situation.

> 

> This has *always* been what we've done. The adaptation on the first 

> symbols is very fast because we accumulate probabilities starting

> from very small flat values. For example, we would initialize a PDF

> with a value of 32 for each symbol and every time we encode a symbol,

> we boost the corresponding bin by 128. The flat distribution really

> doesn't stay around for long. In the case where the probability

> distribution has to always sum to a power of two, then I have a way

> to adjust the adaptation rate to have the same behaviour as the

> accumulation I described.

> 

>> However, still the looking best approach here is to choose

>> separate optimal initial distribution for each ID - just average

>> over sample videos and fix these probabilities in the codec

>> standard.

> 

> As I said, we might do that someday... but it's a *lot* of work

> (that you have to redo frequently) for a small gain.

> 

>> Also, it seems beneficial to separately optimize 'rate' for each ID

>> and again fix it in the standard.

> 

> Also, on the TODO list.

> 

>> A separate suggestion is to take a look at sequences from your data

>> - here is example from single frame of you sample for ID=0: 

>> "00114044010001001404441000010010000000000000000000000100000000000000000000000000000000000000000000000000"

>>

>>

>> </span><br>

    It clearly suggests we have two very different behaviors here - it

    would<br>

    <span style="white-space: pre;">>> be beneficial to split it into at least 2 separate IDs.

> 

> What you saw was basically the symbol that says whether a superblock

> of the image was well predicted or if we need to code something. It's

> going to be different for every clip and since it's one of the first

> symbols being coded, there isn't much "context" to use for it. In

> this particular case, adapting faster helps. In other clips it

> doesn't.

> 

>> Good to hear that you have not only switched to accurate entropy

>> coder, but also to exponential forgetting, which is much better at

>> adapting to local situation. Also you don't longer need the costly

>> non-power-of-two denominator.

> 

> Well, we're still investigating all options. The non-power-of-two

> code always had exponential forgetting (but it was in steps, which is

> less accurate, but also suffers from less rounding issues so it's

> about as good). We now also have a way of making the overhead very

> small even for non-power-of-two without using divisions. So the

> jury's still out -- both options are viable, including dozens of

> variants and combinations.

> 

>> For power-of-two denominator, standard range coder needs two 

>> multiplications per symbol, rANS only one, has simpler multi-bit 

>> renormalization and allows for further optimizations - leading to

>> ~3x faster/cheaper (software) decoding (also reducing frequency of

>> hardware decoder) - here are optimized implementations: 

>> <a class="moz-txt-link-freetext" href="https://github.com/jkbonfield/rans_static">https://github.com/jkbonfield/rans_static</a> The inconvenience is

>> indeed that encoder needs a buffer for backward encoding within a

>> data block (e.g. a frame or its part) - additional a few kilobyte

>> buffer in costly video encoder seems negligible (?).

> 

> Well, these days multiplier hardware is actually cheaper than memory 

> (you'll need a lot more than a few kB to cover the worst case).

> 

> Cheers,

> 

> Jean-Marc

> 

> </span><br>

    <br>

    <br>

    -- <br>

    dr Jarosław Duda<br>

    Institute of Computer Science and Computer Mathematics,<br>

    Jagiellonian University, Cracow, Poland<br>

    <a class="moz-txt-link-freetext" href="http://th.if.uj.edu.pl/~dudaj/">http://th.if.uj.edu.pl/~dudaj/</a><br>

    ega<br>

  </body>

</html>