[daala] HDR is coming

Sun Aug 16 08:35:28 PDT 2015

On 8/16/2015 5:17 AM, HuBandiT at gmail.com wrote:
> HDR and higher bit-depth seem to be coming:
>
> http://www.dvinfo.net/article/misc/science_n_technology/hpa-tech-retreat-2014-day-4.html 
> section "Better Pixels: Best Bang for the Buck?"
>
> http://www.dvinfo.net/article/misc/science_n_technology/hpa-tech-retreat-2014-day-1-poynton-watkinson.html
>
>   * industry seems to use 12-14 bits today, consensus seems to be at
>     least 12 bits of luma is needed soon even for consumers; prosumer
>     camcorders (e.g. Sony PXW-X70 - $2000) are doing 10-bit 4:2:2
>     1080p59.94 today, and anything above $2500-3000 seems to be 12 bit
>     or above
>   * looks like 13 bits would be sufficient with a simple log curve,
>     Dolby is proposing 12 bits with their "Perceptual Quantization" curve
>

dunno about "typical" video (nor can I speak for the Daala people, I am 
an outsider), but to mesh well with PC graphics hardware, float16 
(half-float) is a good option. you can also truncate the float16 to 
12-bits while still keeping roughly the same visual fidelity as 8-bit 
LDR. if truncating to 10 bits, there is a more noticeable loss of fidelity.

in some of my own video codecs, mostly intended for more special-purpose 
use-cases, I have generally also supported alpha channels and alternate 
layers as well, so for example, a video clip with luminance maps, normal 
and depth maps, and specular maps, could be done.

a typical configuration was:
   RGBA (RGB+Alpha)
   XYZd (Normal + Depth)
   LuRGB (Luminance)
   SpRGBe (Specular color and exponent)

likewise, decode paths were also provided to support decoding directly 
to BC6H and BC7 compressed textures. each layer would be essentially its 
own texture (since the graphics hardware only supports 3 or 4 components 
in images).

this was basically so the contents of the video stream could be rendered 
in a 3D environment.

a generally effective way of doing HDR is basically just encoding the 
float16 data as if it were larger integer components (12 or 16 bits), 
though it is worth noting that this requires supporting a larger range 
for DCT components and similar (vs the traditional +/-32k), which 
effects the VLC coding and storage of intermediate blocks (one can still 
optimize for the traditional range, and escape-code occasional larger 
values).

for example, one of my codecs (loosely based on JPEG and MPEG-1, 
ignoring all my VQ codecs ATM) had used a VLC scheme resembling a 
tweaked version of the JPEG scheme:
   high 3 bits of symbol: zero count (0-6)
   low 5 bits: value prefix, encodes values of +/- 32k using a scheme 
similar to Deflate's distance-coding,
     sign is folded into the LSB, forming the pattern: 0, -1, 1, -2, 2, ...
     conversion to/from twos complement can be done with shifts and xor.
escape coded case:
   Z=7 (111)
   low-5 bits encode zero-count, encoded similar to Deflate's run-length 
count.
   next symbol:
     00-7F: prefix for larger-range symbol
       00-3F: 32-bit range (used)
       40-7F: 64-bit range (currently unused)
     80-FF: used for special commands, and a few bit-packed vector codings.

the VLC scheme allowed for commands in the AC coeffs, but in my existing 
codecs none are used (commands were generally at the level of individual 
blocks or macroblocks). DC always used a form similar to that of the 
second escape-coded symbol.

partly as a bit of funkiness, the codec had done motion compensation 
per-plane, so the Y/U/V/A planes could potentially have different motion 
vectors (though, joint vectors may also be used).

typically, in lossy mode, 4:2:0 chroma subsampling was used. the codec 
does support a lossless mode though, though lossless coding tends to 
result in a rather high bitrate, such as 40-100Mbps or more (it disables 
chroma subsampling, and uses the RCT color transform and WHT, instead of 
YCbCr and DCT).

I have generally not used arithmetic compression, as it tends to be 
slower than is worthwhile.
though, one can get passable results by gluing together a Huffman and 
bitwise range-coder, but usually it doesn't seem worthwhile to bet maybe 
a 5-15% compression increase at the expense of halving the codec speed.

note that generally, static Huffman was used, with Huffman tables sent 
in I-frames, and themselves entropy coded (typically, up to 12 or so 
Huffman tables would be used):
   YDC, YAC1, YAC2, UVDC, UVAC1, UVAC2, ADC, AAC1, AAC2, pYAC1, pUVAC1, 
pAAC1

how they were used was up to the encoder, but there is currently an 
implementation limit of 16 Huffman tables per-layer (each layer has its 
own set of tables). a combination of MTF and Adaptive-Rice coding was 
used to encode the tables (otherwise, it is a similar scheme to that 
used in Deflate).

quantization tables were also encoded similarly.

roughly, (lossy) compression seemed to be somewhere between MPEG and XviD.
decoding speed was fairly close to that of XviD (though, mine was a 
little slower, both were in the general area around 100Mpix/sec on my PC).

much of the decoding time is dominated by YUV->RGB conversion, and 
decoding speed can be a bit faster by decoding to YUV422 or transcoding 
to a compressed-texture format.

still-image / I-frame only coding was generally slightly better than 
JPEG, and for lossless coding sizes seemed pretty similar to those of 
JPEG-XR (both were better than PNG for "natural" images, but worse for 
things like screen-shots or line-art).

though, this codec is used less often in my case than some of my VQ 
codecs. which mostly have a bit of a speed advantage, like my VQ codecs 
can do screen-capture at high resolutions without effectively killing my 
PC in the process, and without the crap image quality of MS Video 1 
(about the only other "official" codec which seemed to do a passable job 
at screen-capture, others either killed the CPU or the HDD).

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.xiph.org/pipermail/daala/attachments/20150816/06302eaf/attachment.htm