[daala] Curious about progress of codec

Jean-Marc Valin jmvalin at jmvalin.ca
Wed Apr 27 16:19:22 UTC 2016


On 04/27/2016 09:14 AM, Jarek Duda wrote:
> Regarding starting every frame with flat distribution, I believe you
> still use non-I-frames: which require some I-frame for decoding.
> So maybe at least start with the probability distribution from this
> recent required I-frame?

OK, so here's what happens in a practical videoconferencing situation.
You're sending I-frames at (e.g.) 15 second interval, but 5sec after an
I-frame, you lose a packet for a P-frame. It means you have to "conceal"
(guess) what was in that packet and make up something plausible on the
screen (obviously not perfect), then you decode the next P-frames and
any "error" you did in the concealment gets spread around as the motion
vectors carry them away. It's not pretty, but you can still figure out
what's going on, and if a big change happens you can still see what's
happening.

Now, with what you're suggesting, everything that happens after the loss
is completely undecodable and you have no idea what's going on. In that
case, the decoder has no choice but to completely freeze the image for
10 seconds until the next I-frame. That's why people don't rely (or at
least want to have the option of not relying) on anything from previous
frames for entropy decoding.

> Another option is varying rate - start with e.g. rate=2,3 for some
> number of symbols to quickly get out of the terrible flat distribution,
> then rise it to e.g. 5 for more subtle adaptation to local situation.

This has *always* been what we've done. The adaptation on the first
symbols is very fast because we accumulate probabilities starting from
very small flat values. For example, we would initialize a PDF with a
value of 32 for each symbol and every time we encode a symbol, we boost
the corresponding bin by 128. The flat distribution really doesn't stay
around for long. In the case where the probability distribution has to
always sum to a power of two, then I have a way to adjust the adaptation
rate to have the same behaviour as the accumulation I described.

> However, still the looking best approach here is to choose separate
> optimal initial distribution for each ID - just average over sample
> videos and fix these probabilities in the codec standard.

As I said, we might do that someday... but it's a *lot* of work (that
you have to redo frequently) for a small gain.

> Also, it seems beneficial to separately optimize 'rate' for each ID and
> again fix it in the standard.

Also, on the TODO list.

> A separate suggestion is to take a look at sequences from your data -
> here is example from single frame of you sample for ID=0:
> "00114044010001001404441000010010000000000000000000000100000000000000000000000000000000000000000000000000"
> 
> It clearly suggests we have two very different behaviors here - it would
> be beneficial to split it into at least 2 separate IDs.

What you saw was basically the symbol that says whether a superblock of
the image was well predicted or if we need to code something. It's going
to be different for every clip and since it's one of the first symbols
being coded, there isn't much "context" to use for it. In this
particular case, adapting faster helps. In other clips it doesn't.

> Good to hear that you have not only switched to accurate entropy coder,
> but also to exponential forgetting, which is much better at adapting to
> local situation. Also you don't longer need the costly non-power-of-two
> denominator.

Well, we're still investigating all options. The non-power-of-two code
always had exponential forgetting (but it was in steps, which is less
accurate, but also suffers from less rounding issues so it's about as
good). We now also have a way of making the overhead very small even for
non-power-of-two without using divisions. So the jury's still out --
both options are viable, including dozens of variants and combinations.

> For power-of-two denominator, standard range coder needs two
> multiplications per symbol, rANS only one, has simpler multi-bit
> renormalization and allows for further optimizations - leading to ~3x
> faster/cheaper (software) decoding (also reducing frequency of hardware
> decoder) - here are optimized implementations:
> https://github.com/jkbonfield/rans_static
> The inconvenience is indeed that encoder needs a buffer for backward
> encoding within a data block (e.g. a frame or its part) - additional a
> few kilobyte buffer in costly video encoder seems negligible (?).

Well, these days multiplier hardware is actually cheaper than memory
(you'll need a lot more than a few kB to cover the worst case).

Cheers,

	Jean-Marc



More information about the daala mailing list