[daala] Curious about progress of codec
Jarek Duda
dudajar at gmail.com
Wed Apr 27 17:08:29 UTC 2016
Regarding using probability distribution from the last I-frame, I have
meant remembering probability distribution at the end of this I-frame
(not updated further) - additional buffer which is updated every
I-frame, P-frames use it as the initial probability distribution.
But sure, this is memory cost (a few kilobytes) as trade-off for ratio.
Regarding memory trade-off for rANS, encoder needs a buffer just to
reverse order - you can use a general memory of encoder for this purpose.
The size of this buffer is given by a chosen size of data block (usually
is fixed), the cost is 2-3 bits of loss per block - completely
negligible for a few kilobyte buffer.
From the other side, range coding is a few times more costly for
software decoding. From hardware perspective, multiplications also
require energy, and increase the number of cycles per symbol - needs
increasing frequency (energy).
Here is entire rANS decoding step for 32bits, 16 bit renormalization
(can be reduced to e.g. 6bit x 10bit -> 16bit multiplication with 4bit
renormalization):
s =symbol(x & mask); // SIMD to find s such that CDF[s] <= (x & mask) < CDF[s+1]
x = (CDF[s+1] - CDF[s]) * (x >> n) + (x & mask) - CDF[s];
if (x < 2^16) x = x << 16 + read16bits(); // renormalization
The only branch can be changed to
b = (x < 2^16);
x = x << (y << 4) + (*stream) >> ((1 xor b) << 4);
stream += b;
Good to hear that you plan to finally fix optimized initial probability
distributions and rate for separate IDs - sure, it should be done after
deciding everything else.
Cheers,
Jarek
On 16/04/27 18:19, Jean-Marc Valin wrote:
> On 04/27/2016 09:14 AM, Jarek Duda wrote: >> Regarding starting every frame with flat distribution, I believe >>
you still use non-I-frames: which require some I-frame for >> decoding.
So maybe at least start with the probability distribution >> from this
recent required I-frame? > > OK, so here's what happens in a practical
videoconferencing > situation. You're sending I-frames at (e.g.) 15
second interval, but > 5sec after an I-frame, you lose a packet for a
P-frame. It means you > have to "conceal" (guess) what was in that
packet and make up > something plausible on the screen (obviously not
perfect), then you > decode the next P-frames and any "error" you did in
the concealment > gets spread around as the motion vectors carry them
away. It's not > pretty, but you can still figure out what's going on,
and if a big > change happens you can still see what's happening. > >
Now, with what you're suggesting, everything that happens after the >
loss is completely undecodable and you have no idea what's going on. >
In that case, the decoder has no choice but to completely freeze the >
image for 10 seconds until the next I-frame. That's why people don't >
rely (or at least want to have the option of not relying) on anything >
from previous frames for entropy decoding. > >> Another option is
varying rate - start with e.g. rate=2,3 for some >> number of symbols to
quickly get out of the terrible flat >> distribution, then rise it to
e.g. 5 for more subtle adaptation to >> local situation. > > This has
*always* been what we've done. The adaptation on the first > symbols is
very fast because we accumulate probabilities starting > from very small
flat values. For example, we would initialize a PDF > with a value of 32
for each symbol and every time we encode a symbol, > we boost the
corresponding bin by 128. The flat distribution really > doesn't stay
around for long. In the case where the probability > distribution has to
always sum to a power of two, then I have a way > to adjust the
adaptation rate to have the same behaviour as the > accumulation I
described. > >> However, still the looking best approach here is to
choose >> separate optimal initial distribution for each ID - just
average >> over sample videos and fix these probabilities in the codec
>> standard. > > As I said, we might do that someday... but it's a
*lot* of work > (that you have to redo frequently) for a small gain. >
>> Also, it seems beneficial to separately optimize 'rate' for each ID
>> and again fix it in the standard. > > Also, on the TODO list. > >> A
separate suggestion is to take a look at sequences from your data >> -
here is example from single frame of you sample for ID=0: >>
"00114044010001001404441000010010000000000000000000000100000000000000000000000000000000000000000000000000"
>> >> >>
It clearly suggests we have two very different behaviors here - it would
>> be beneficial to split it into at least 2 separate IDs. > > What you saw was basically the symbol that says whether a
superblock > of the image was well predicted or if we need to code
something. It's > going to be different for every clip and since it's
one of the first > symbols being coded, there isn't much "context" to
use for it. In > this particular case, adapting faster helps. In other
clips it > doesn't. > >> Good to hear that you have not only switched to
accurate entropy >> coder, but also to exponential forgetting, which is
much better at >> adapting to local situation. Also you don't longer
need the costly >> non-power-of-two denominator. > > Well, we're still
investigating all options. The non-power-of-two > code always had
exponential forgetting (but it was in steps, which is > less accurate,
but also suffers from less rounding issues so it's > about as good). We
now also have a way of making the overhead very > small even for
non-power-of-two without using divisions. So the > jury's still out --
both options are viable, including dozens of > variants and
combinations. > >> For power-of-two denominator, standard range coder
needs two >> multiplications per symbol, rANS only one, has simpler
multi-bit >> renormalization and allows for further optimizations -
leading to >> ~3x faster/cheaper (software) decoding (also reducing
frequency of >> hardware decoder) - here are optimized implementations:
>> https://github.com/jkbonfield/rans_static The inconvenience is >>
indeed that encoder needs a buffer for backward encoding within a >>
data block (e.g. a frame or its part) - additional a few kilobyte >>
buffer in costly video encoder seems negligible (?). > > Well, these
days multiplier hardware is actually cheaper than memory > (you'll need
a lot more than a few kB to cover the worst case). > > Cheers, > >
Jean-Marc > >
--
dr Jarosław Duda
Institute of Computer Science and Computer Mathematics,
Jagiellonian University, Cracow, Poland
http://th.if.uj.edu.pl/~dudaj/
ega
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xiph.org/pipermail/daala/attachments/20160427/afaf8039/attachment-0001.html>
More information about the daala
mailing list