[daala] Curious about progress of codec

Jarek Duda dudajar at gmail.com
Wed Apr 27 17:08:29 UTC 2016


Regarding using probability distribution from the last I-frame, I have 
meant remembering probability distribution at the end of this I-frame 
(not updated further) - additional buffer which is updated every 
I-frame, P-frames use it as the initial probability distribution.
But sure, this is memory cost (a few kilobytes) as trade-off for ratio.

Regarding memory trade-off for rANS, encoder needs a buffer just to 
reverse order - you can use a general memory of encoder for this purpose.
The size of this buffer is given by a chosen size of data block (usually 
is fixed), the cost is 2-3 bits of loss per block - completely 
negligible for a few kilobyte buffer.
 From the other side, range coding is a few times more costly for 
software decoding. From hardware perspective, multiplications also 
require energy, and increase the number of cycles per symbol - needs 
increasing frequency (energy).

Here is entire rANS decoding step for 32bits, 16 bit renormalization 
(can be reduced to e.g. 6bit x 10bit -> 16bit multiplication with 4bit 
renormalization):

s =symbol(x & mask);         // SIMD to find s such that CDF[s] <= (x & mask) < CDF[s+1]
x = (CDF[s+1] - CDF[s]) * (x >> n) + (x & mask) - CDF[s];
if (x < 2^16) x = x << 16 + read16bits();     // renormalization

The only branch can be changed to

b = (x < 2^16);
x = x << (y << 4) + (*stream) >> ((1 xor b) << 4);
stream += b;

Good to hear that you plan to finally fix optimized initial probability 
distributions and rate for separate IDs - sure, it should be done after 
deciding everything else.

Cheers,
Jarek


On 16/04/27 18:19, Jean-Marc Valin wrote:
> On 04/27/2016 09:14 AM, Jarek Duda wrote:  >> Regarding starting every frame with flat distribution, I believe >> 
you still use non-I-frames: which require some I-frame for >> decoding. 
So maybe at least start with the probability distribution >> from this 
recent required I-frame? > > OK, so here's what happens in a practical 
videoconferencing > situation. You're sending I-frames at (e.g.) 15 
second interval, but > 5sec after an I-frame, you lose a packet for a 
P-frame. It means you > have to "conceal" (guess) what was in that 
packet and make up > something plausible on the screen (obviously not 
perfect), then you > decode the next P-frames and any "error" you did in 
the concealment > gets spread around as the motion vectors carry them 
away. It's not > pretty, but you can still figure out what's going on, 
and if a big > change happens you can still see what's happening. > > 
Now, with what you're suggesting, everything that happens after the > 
loss is completely undecodable and you have no idea what's going on. > 
In that case, the decoder has no choice but to completely freeze the > 
image for 10 seconds until the next I-frame. That's why people don't > 
rely (or at least want to have the option of not relying) on anything > 
from previous frames for entropy decoding. > >> Another option is 
varying rate - start with e.g. rate=2,3 for some >> number of symbols to 
quickly get out of the terrible flat >> distribution, then rise it to 
e.g. 5 for more subtle adaptation to >> local situation. > > This has 
*always* been what we've done. The adaptation on the first > symbols is 
very fast because we accumulate probabilities starting > from very small 
flat values. For example, we would initialize a PDF > with a value of 32 
for each symbol and every time we encode a symbol, > we boost the 
corresponding bin by 128. The flat distribution really > doesn't stay 
around for long. In the case where the probability > distribution has to 
always sum to a power of two, then I have a way > to adjust the 
adaptation rate to have the same behaviour as the > accumulation I 
described. > >> However, still the looking best approach here is to 
choose >> separate optimal initial distribution for each ID - just 
average >> over sample videos and fix these probabilities in the codec 
 >> standard. > > As I said, we might do that someday... but it's a 
*lot* of work > (that you have to redo frequently) for a small gain. > 
 >> Also, it seems beneficial to separately optimize 'rate' for each ID 
 >> and again fix it in the standard. > > Also, on the TODO list. > >> A 
separate suggestion is to take a look at sequences from your data >> - 
here is example from single frame of you sample for ID=0: >> 
"00114044010001001404441000010010000000000000000000000100000000000000000000000000000000000000000000000000" 
 >> >> >>
It clearly suggests we have two very different behaviors here - it would
>> be beneficial to split it into at least 2 separate IDs.  > > What you saw was basically the symbol that says whether a 
superblock > of the image was well predicted or if we need to code 
something. It's > going to be different for every clip and since it's 
one of the first > symbols being coded, there isn't much "context" to 
use for it. In > this particular case, adapting faster helps. In other 
clips it > doesn't. > >> Good to hear that you have not only switched to 
accurate entropy >> coder, but also to exponential forgetting, which is 
much better at >> adapting to local situation. Also you don't longer 
need the costly >> non-power-of-two denominator. > > Well, we're still 
investigating all options. The non-power-of-two > code always had 
exponential forgetting (but it was in steps, which is > less accurate, 
but also suffers from less rounding issues so it's > about as good). We 
now also have a way of making the overhead very > small even for 
non-power-of-two without using divisions. So the > jury's still out -- 
both options are viable, including dozens of > variants and 
combinations. > >> For power-of-two denominator, standard range coder 
needs two >> multiplications per symbol, rANS only one, has simpler 
multi-bit >> renormalization and allows for further optimizations - 
leading to >> ~3x faster/cheaper (software) decoding (also reducing 
frequency of >> hardware decoder) - here are optimized implementations: 
 >> https://github.com/jkbonfield/rans_static The inconvenience is >> 
indeed that encoder needs a buffer for backward encoding within a >> 
data block (e.g. a frame or its part) - additional a few kilobyte >> 
buffer in costly video encoder seems negligible (?). > > Well, these 
days multiplier hardware is actually cheaper than memory > (you'll need 
a lot more than a few kB to cover the worst case). > > Cheers, > > 
Jean-Marc > >


-- 
dr Jarosław Duda
Institute of Computer Science and Computer Mathematics,
Jagiellonian University, Cracow, Poland
http://th.if.uj.edu.pl/~dudaj/
ega
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xiph.org/pipermail/daala/attachments/20160427/afaf8039/attachment-0001.html>


More information about the daala mailing list