[daala] Curious about progress of codec
BGB
cr88192 at gmail.com
Mon Apr 25 13:44:08 UTC 2016
( same disclaimer as before. correction I meant zlibh not zhuff as an
example of a fast Huffman encoder/decoder. ).
On 4/24/2016 11:58 PM, Jean-Marc Valin wrote:
> On 04/24/2016 10:02 PM, Jarek Duda wrote:
>> 1) First of all, I don't understand why you insist flat initial
>> probability for every frame, not I-frame as you still need it for P,B
>> frames?
> As much as I would like to have the code adapt its probabilities from
> frame to frame, it is just not a viable option for any application (e.g.
> videoconference) that is based on unreliable transport. In that case,
> losing a single packet means your stream is completely undecodable until
> the next keyframe, which is unacceptable. We may the feature as an
> option for applications that do have reliable transport, but we just
> can't rely on it everywhere.
agreed. some of my codecs can be used over UDP, or can be used for
multithreaded encoding/decoding.
for similar reasons, everything resets to defaults each time (for each
UDP datagram, or for each frame slice).
while it may seem like this would be pretty bad for compression, overall
its impact is actually fairly small.
actually, SMTF+AdRice does considerably better than Huffman when it
comes to UDP, as the need to transmit a Huffman table hurts really bad
for small messages. similarly, parts of the frame may or may not arrive,
or may arrive out of order. there were some hacks decoder-side to try to
"integrate" pieces which arrived out of order.
though, in the use case, it was running over LAN (over 802.11b) rather
than the wider internet (the codecs in question were originally designed
for robots, with the encoding being done real-time on an ARM SoC).
though, the 802.11b connection was surprisingly unreliable, and I was
getting a fair bit of packet loss (it was also pretty painful trying to
SSH into it, experience was much better with a network wire).
in the "slice" operation (in PC uses), generally the frame is cut up
into slices of between 16 and 64 scanlines (varies depending on
resolution, typically results in 34 slices per frame).
on my PC (a Phenom II 955), the codec in question is fast enough to do
2160p30 encode and 2160p60 decode using 4 threads (single threaded falls
short of being fast enough to do 2160p, but can do 1080p pretty ok).
it can also do 720p encode on a 700MHz ARM11, and 1080p on a 900MHz
Cortex-A7.
speeds can reach Gpix/sec territory on a dual-socket Xeon E5410 (2x
quad-core, 2.3 GHz).
I have some experimentally faster designs (mostly based on using larger
pixel blocks), but haven't really developed it into a useable/complete
codec yet (off mostly working on other stuff ATM).
however, I am using ugly/nasty technology (VQ / color-cell stuff) which
probably isn't really ideal for "actually good" codecs (Q/bpp is fairly
poor vs DCT based designs). however, I haven't had much luck getting
these sorts of speeds out of DCT based designs.
a block is a colorspace vector (YUVDyuv) followed by a variable number
of interpolation bits. it ranges between a single flat color,
reduced-resolution single-axis blocks (color-vector decoded to a pair of
YUV endpoints which are interpolated between), as well as having blocks
with 4:2:0, 4:2:2, and 4:4:4 chroma subsampling (used sparingly as this
is expensive).
color-vector is basically a YUV center point with another differential
vector Dyuv pointing along the main interpolation axis (single axis). in
the subsampled blocks, it is interpreted as a sort of colorspace
bounding box. the 3-axis blocks are is used in places where sharp chroma
gradients risk causing artifacts.
>> Writing entire initial probability distribution every time doesn't seem
>> reasonable.
> Indeed, that's why we never write initial probabilities.
yep, more sensible to hard code them as some "sane" defaults, and let
adaptation do the rest.
>> Instead, you can use a parametric distribution (e.g. geometric) and just
>> write the (quantized) parameter.
> We have used parametric distributions for some things, but so far in all
> cases I've been able to beat the parametric distribution by using an
> adaptive one (even with flat initialization). Don't get me wrong -- I'd
> much rather use something parametric with tables stored in ROM. But so
> far it's always been a bit of a loss over adapting.
my experiences agree.
<snip>
not much to say on the rest.
More information about the daala
mailing list