[daala] Curious about progress of codec
cr88192 at gmail.com
Wed Apr 27 21:32:34 UTC 2016
On 4/27/2016 12:49 PM, Jean-Marc Valin wrote:
> On 04/27/2016 01:08 PM, Jarek Duda wrote:
>> Regarding using probability distribution from the last I-frame, I have
>> meant remembering probability distribution at the end of this I-frame
>> (not updated further) - additional buffer which is updated every
>> I-frame, P-frames use it as the initial probability distribution.
> While it would be technically possible to do that without hurting
> robustness to losses too much (what if you lost *part* of the keyframe),
> the main problem is that it likely wouldn't actually help either. The
> statistics of symbols on a keyframe are very very different from the
> statistics of those on a P-frame. Not to mention that keyframes don't
> have things like motion vectors which eat up a lot of the bits in P-frames.
> You *could* have the keyframe explicitly send initial probabilities for
> all subsequent P-frames, but it's a lot of work and it's not clear how
> much gain there would be. Not to mention that the statistics again
> change depending on how far your P-frame is from your keyframe.
one of my codecs worked this way (in this case, a DCT-based design
similar to JPEG but with different packaging and entropy coded headers),
just I had been using static Huffman in this case, and the I-Frames sent
the Huffman tables for both the I and P frames (1).
one downside of this approach was that it seemed that the Huffman tables
never really fit the frame all that well, as probability distributions
between frames can be rather variable.
there didn't seem to be a big difference in this codec between using
averaged Huffman tables and using fixed tables, where fixed tables can
have the advantage of being a little faster and simpler (can skip
needing to count symbols, ...).
better compression was possible by sending frame-specific Huffman tables
for each frame, or alternatively by using an adaptive entropy coder.
1: the drawback of this codec though was that it wasn't particularly
fast (only ~ 80 Mpix/sec per thread), and its Q/bpp wasn't particularly
impressive either. for I-Frames, Q/bpp was slightly better than JPEG.
it did support motion-compensation and similar, but still didn't really
give good image quality much under around 0.7bpp or so, so I didn't
really see as much point in this one (vs VQ/color-cell based designs
which could decode at around 150-200 Mpix/sec per thread, and have
otherwise similar Q/bpp, giving ok results at around 0.4-0.8 bpp).
one unresolved issue was also that there seemed to be a sort of feedback
where DCT artifacts (AKA: "JPEG artifacts") would rapidly accumulate
over a period of frames, and there wasn't really any good way to address
this (partial soloutions were doing the encoding closed-loop, which was
very bad for encoder speed, or always using either skip or replace mode
for blocks, which was bad for compression).
even open-loop encoding (using only skip or replace modes) was still
pretty slow (~ 45-50 Mpix/sec per thread), so this kind of killed this
one off (need multiple threads and pretty heavy CPU load to encode 1080p
in real-time with this).
so this codec was both slower and gave worse Q/bpp than XviD, which just
wasn't really all that impressive... (so, I mostly just stuck with VQ
designs for stuff that needed to be fast, and offloading to XviD or
H.264 AVC or similar for stuff that needs better Q/bpp).
for decoding, XviD gives ~ 105 Mpix/sec, which I am not entirely sure
how it does so being DCT based and all (ex: for JPEG decoding, fastest I
can get is 90, and this is with some amount of ugly hacks). nothing
obvious is revealed by looking at the source (and it still seems pretty
fast if built as plain scalar code, so dunno there).
however, neither XviD nor H.264 (x264) can do real-time 1080p encoding
on my PC, so... yeah... I am left with my VQ-based stuff... (or other
options which eat my CPU or HDD or both).
but, admittedly, I am far from an expert on all this...
More information about the daala