[daala] Curious about progress of codec

Wed Apr 27 21:32:34 UTC 2016

On 4/27/2016 12:49 PM, Jean-Marc Valin wrote:
> On 04/27/2016 01:08 PM, Jarek Duda wrote:
>> Regarding using probability distribution from the last I-frame, I have
>> meant remembering probability distribution at the end of this I-frame
>> (not updated further) - additional buffer which is updated every
>> I-frame, P-frames use it as the initial probability distribution.
> While it would be technically possible to do that without hurting
> robustness to losses too much (what if you lost *part* of the keyframe),
> the main problem is that it likely wouldn't actually help either. The
> statistics of symbols on a keyframe are very very different from the
> statistics of those on a P-frame. Not to mention that keyframes don't
> have things like motion vectors which eat up a lot of the bits in P-frames.
>
> You *could* have the keyframe explicitly send initial probabilities for
> all subsequent P-frames, but it's a lot of work and it's not clear how
> much gain there would be. Not to mention that the statistics again
> change depending on how far your P-frame is from your keyframe.

one of my codecs worked this way (in this case, a DCT-based design 
similar to JPEG but with different packaging and entropy coded headers), 
just I had been using static Huffman in this case, and the I-Frames sent 
the Huffman tables for both the I and P frames (1).

one downside of this approach was that it seemed that the Huffman tables 
never really fit the frame all that well, as probability distributions 
between frames can be rather variable.

there didn't seem to be a big difference in this codec between using 
averaged Huffman tables and using fixed tables, where fixed tables can 
have the advantage of being a little faster and simpler (can skip 
needing to count symbols, ...).

better compression was possible by sending frame-specific Huffman tables 
for each frame, or alternatively by using an adaptive entropy coder.

1: the drawback of this codec though was that it wasn't particularly 
fast (only ~ 80 Mpix/sec per thread), and its Q/bpp wasn't particularly 
impressive either. for I-Frames, Q/bpp was slightly better than JPEG.

it did support motion-compensation and similar, but still didn't really 
give good image quality much under around 0.7bpp or so, so I didn't 
really see as much point in this one (vs VQ/color-cell based designs 
which could decode at around 150-200 Mpix/sec per thread, and have 
otherwise similar Q/bpp, giving ok results at around 0.4-0.8 bpp).

one unresolved issue was also that there seemed to be a sort of feedback 
where DCT artifacts (AKA: "JPEG artifacts") would rapidly accumulate 
over a period of frames, and there wasn't really any good way to address 
this (partial soloutions were doing the encoding closed-loop, which was 
very bad for encoder speed, or always using either skip or replace mode 
for blocks, which was bad for compression).

even open-loop encoding (using only skip or replace modes) was still 
pretty slow (~ 45-50 Mpix/sec per thread), so this kind of killed this 
one off (need multiple threads and pretty heavy CPU load to encode 1080p 
in real-time with this).

so this codec was both slower and gave worse Q/bpp than XviD, which just 
wasn't really all that impressive... (so, I mostly just stuck with VQ 
designs for stuff that needed to be fast, and offloading to XviD or 
H.264 AVC or similar for stuff that needs better Q/bpp).

for decoding, XviD gives ~ 105 Mpix/sec, which I am not entirely sure 
how it does so being DCT based and all (ex: for JPEG decoding, fastest I 
can get is 90, and this is with some amount of ugly hacks). nothing 
obvious is revealed by looking at the source (and it still seems pretty 
fast if built as plain scalar code, so dunno there).

however, neither XviD nor H.264 (x264) can do real-time 1080p encoding 
on my PC, so... yeah... I am left with my VQ-based stuff... (or other 
options which eat my CPU or HDD or both).

but, admittedly, I am far from an expert on all this...