[daala] OT: more side info (Re: OT: working on a video codec, maybe interesting.)

BGB cr88192 at gmail.com
Sat Sep 20 08:34:45 PDT 2014


On 9/8/2014 10:16 PM, BGB wrote:
> sorry if this is OT.
>
> well, this isn't particularly relevant either to Daala, or to any 
> other mainstream codecs, but for my own uses I have developed a custom 
> codec. I sort of wanted to bring it to attention of Xiph in the 
> off-chance it might be relevant to anyone.
>

side note:
but, yeah, a big factor in my use case is decode-speed.

if Daala can give good decode speeds, I am hopeful, more so if it 
supports an alpha channel, HDR, ... and can be made to decode to BC7 and 
BC6H and similar (though, YUV output may be sufficient, as it possible 
to write a faster transcoder from YUV to BCn than when going from RGB.


otherwise, I will probably stop writing on the topic if this is too much 
of an issue.


some past results (on my HW, mostly from memory, quality=subjective):
XviD: pulls off ~110 Mpix/sec decoding;
   quality is ok.
   issues:
     still encumbered;
     no native HDR or alpha-channel support;
     maxing out quality seems to put severe hurt on decode speeds.

Theora:
   was getting around 100 Mpix/sec to YUV;
      direct linking against libtheora (ported to MSVC), with video in 
an AVI container.
   even with maxed-out quality settings, the image quality looks worse 
than XviD
      both of which seem to look worse than maxed-out MPEG-1
   no alpha channel or HDR:
      had tried faking alpha via a magic color, but was getting ugly 
artifacts;
      there is a trick of doubling vertical resolution, but this halves 
decode speeds;
      no obvious/good way to shove more data into bitstream without 
breaking decoder.

H.264:
     in the past it was very slow, but got much faster when I got a 
newer video card.
     55 Mpix/sec before, now 180 Mpix/sec, but I suspect due to using HW 
decoding or similar.
     issues: patent encumbered, ...


custom codecs (plain C, single threaded decoder cases):
M-JPEG (with custom extensions):
     around 90 Mpix/sec (BGRA).
     around 140 Mpix/sec (DXTn).
     around 120 Mpix/sec (BC7).
     pros:
         extended to support alpha, HDR, ...
         can have high image quality (at 80% or 90%).
         (mostly) backwards compatible with normal M-JPEG.
     cons:
         lackluster decode speeds;
         absurd bitrates.

BTIC1C (RPZA based):
     around 190 Mpix/sec (BGRA);
     around 700 Mpix/sec (DXTn);
     around 580 Mpix/sec (BC7).
     pros:
         supports alpha/HDR/...
         decodes at high speeds;
         better size/quality than MJPEG;
     cons:
         inherently lossy (no lossless mode);
         size/quality worse than most other proper video codecs;
         large/complex implementation;
         offline batch encoding is currently fairly slow.

BTIC2C (JPEG-based, highly modified):
     around 80 Mpix/sec (BGRA);
     DXTn / BC7: decoder paths not implemented.
     pros:
         supports alpha/HDR/...
         supports lossless encoding;
         relatively fast encoder;
         more bitrate competitive with standard codecs.
     cons:
         lackluster decode speeds;
         little real advantage over common codecs in the simple case 
(lossy RGB video).


HW: 2009-era AMD Phenom II (quad-core) at 3.4GHz, with PC3-1050 RAM and 
a Radeon 7850.


on Raspberry Pi (700 MHz):
     BTIC1C (generic case, no ARM-specific optimizations):
         around 19 Mpix/sec (BGRA);
         around 45 Mpix/sec (DXTn);
         around 38 Mpix/sec (BC7).
     others cases:
         currently untested.


just to explain how it is I ended up using such an awful design (BTIC1C):

well, in my case, quality/bitrate was less important than speed, and 
with a bit of hackery and extension, I am able to get pretty viable 
video quality at around 1-2bpp, which is acceptable in this case (video 
sequences are typically short animated textures).

this design basically was winning out on benchmarks, even if 
size/quality kind of sucks, and the codec code itself is an awful mess.

most of the endoding time (for batch encoding) is due to a "block 
quantizer", which tries different mutations on a block and sees what the 
cheapest encoding it can get away with is. the faster screen-capture 
case does not use a block-quantizer (and uses a simple lossless form of 
the color-delta coding).


OTOH, BTIC2C is basically a JPEG-like core (with some MPEG-like 
features), but with some changes:
     uses a TLV packaging (similar to BTIC1C, loosely similar to RIFF or 
PNG, but more compact);
     uses bit-packed Huffman and Quantization tables (based on Rice coding);
     may use WHT and RCT or YCoCg instead of DCT and YCbCr;
     modified VLC scheme (Z3V5, extended range and lower bitrate than 
the JPEG Z4V4 scheme, *);
     adds support for interframe deltas and block motion compensation;
     supports lossless coding, as well as alpha and float16 images and 
similar;
     ...


*: high 3 bits: zero count or escape, low 5: value prefix.
     VLC values are encoded Deflate-like, albeit with a sign-folding 
scheme (0, -1, 1, -2, 2, -3, ...).
     Z=7, escapes to the use of either longer encoding, or the use of 
packed coefficient vectors.

there were some experimental JPEG/BTIC1C and BTIC 2C/1C hybrid modes 
(where frames may be coded in one format or another), but they were 
problematic and offered limited advantage over straight BTIC1C (were 
slower and did relatively little to improve either quality or bitrate).




More information about the daala mailing list