[daala] Curious about progress of codec

Mon Apr 25 13:44:08 UTC 2016

( same disclaimer as before. correction I meant zlibh not zhuff as an 
example of a fast Huffman encoder/decoder. ).

On 4/24/2016 11:58 PM, Jean-Marc Valin wrote:
> On 04/24/2016 10:02 PM, Jarek Duda wrote:
>> 1) First of all, I don't understand why you insist flat initial
>> probability for every frame, not I-frame as you still need it for P,B
>> frames?
> As much as I would like to have the code adapt its probabilities from
> frame to frame, it is just not a viable option for any application (e.g.
> videoconference) that is based on unreliable transport. In that case,
> losing a single packet means your stream is completely undecodable until
> the next keyframe, which is unacceptable. We may the feature as an
> option for applications that do have reliable transport, but we just
> can't rely on it everywhere.

agreed. some of my codecs can be used over UDP, or can be used for 
multithreaded encoding/decoding.

for similar reasons, everything resets to defaults each time (for each 
UDP datagram, or for each frame slice).

while it may seem like this would be pretty bad for compression, overall 
its impact is actually fairly small.

actually, SMTF+AdRice does considerably better than Huffman when it 
comes to UDP, as the need to transmit a Huffman table hurts really bad 
for small messages. similarly, parts of the frame may or may not arrive, 
or may arrive out of order. there were some hacks decoder-side to try to 
"integrate" pieces which arrived out of order.

though, in the use case, it was running over LAN (over 802.11b) rather 
than the wider internet (the codecs in question were originally designed 
for robots, with the encoding being done real-time on an ARM SoC).

though, the 802.11b connection was surprisingly unreliable, and I was 
getting a fair bit of packet loss (it was also pretty painful trying to 
SSH into it, experience was much better with a network wire).

in the "slice" operation (in PC uses), generally the frame is cut up 
into slices of between 16 and 64 scanlines (varies depending on 
resolution, typically results in 34 slices per frame).

on my PC (a Phenom II 955), the codec in question is fast enough to do 
2160p30 encode and 2160p60 decode using 4 threads (single threaded falls 
short of being fast enough to do 2160p, but can do 1080p pretty ok).

it can also do 720p encode on a 700MHz ARM11, and 1080p on a 900MHz 
Cortex-A7.
speeds can reach Gpix/sec territory on a dual-socket Xeon E5410 (2x 
quad-core, 2.3 GHz).

I have some experimentally faster designs (mostly based on using larger 
pixel blocks), but haven't really developed it into a useable/complete 
codec yet (off mostly working on other stuff ATM).

however, I am using ugly/nasty technology (VQ / color-cell stuff) which 
probably isn't really ideal for "actually good" codecs (Q/bpp is fairly 
poor vs DCT based designs). however, I haven't had much luck getting 
these sorts of speeds out of DCT based designs.

a block is a colorspace vector (YUVDyuv) followed by a variable number 
of interpolation bits. it ranges between a single flat color, 
reduced-resolution single-axis blocks (color-vector decoded to a pair of 
YUV endpoints which are interpolated between), as well as having blocks 
with 4:2:0, 4:2:2, and 4:4:4 chroma subsampling (used sparingly as this 
is expensive).

color-vector is basically a YUV center point with another differential 
vector Dyuv pointing along the main interpolation axis (single axis). in 
the subsampled blocks, it is interpreted as a sort of colorspace 
bounding box. the 3-axis blocks are is used in places where sharp chroma 
gradients risk causing artifacts.

>> Writing entire initial probability distribution every time doesn't seem
>> reasonable.
> Indeed, that's why we never write initial probabilities.

yep, more sensible to hard code them as some "sane" defaults, and let 
adaptation do the rest.

>> Instead, you can use a parametric distribution (e.g. geometric) and just
>> write the (quantized) parameter.
> We have used parametric distributions for some things, but so far in all
> cases I've been able to beat the parametric distribution by using an
> adaptive one (even with flat initialization). Don't get me wrong -- I'd
> much rather use something parametric with tables stored in ROM. But so
> far it's always been a bit of a loss over adapting.

my experiences agree.

<snip>

not much to say on the rest.