[daala] HDR is coming
BGB
cr88192 at gmail.com
Sun Aug 16 08:35:28 PDT 2015
On 8/16/2015 5:17 AM, HuBandiT at gmail.com wrote:
> HDR and higher bit-depth seem to be coming:
>
> http://www.dvinfo.net/article/misc/science_n_technology/hpa-tech-retreat-2014-day-4.html
> section "Better Pixels: Best Bang for the Buck?"
>
> http://www.dvinfo.net/article/misc/science_n_technology/hpa-tech-retreat-2014-day-1-poynton-watkinson.html
>
> * industry seems to use 12-14 bits today, consensus seems to be at
> least 12 bits of luma is needed soon even for consumers; prosumer
> camcorders (e.g. Sony PXW-X70 - $2000) are doing 10-bit 4:2:2
> 1080p59.94 today, and anything above $2500-3000 seems to be 12 bit
> or above
> * looks like 13 bits would be sufficient with a simple log curve,
> Dolby is proposing 12 bits with their "Perceptual Quantization" curve
>
dunno about "typical" video (nor can I speak for the Daala people, I am
an outsider), but to mesh well with PC graphics hardware, float16
(half-float) is a good option. you can also truncate the float16 to
12-bits while still keeping roughly the same visual fidelity as 8-bit
LDR. if truncating to 10 bits, there is a more noticeable loss of fidelity.
in some of my own video codecs, mostly intended for more special-purpose
use-cases, I have generally also supported alpha channels and alternate
layers as well, so for example, a video clip with luminance maps, normal
and depth maps, and specular maps, could be done.
a typical configuration was:
RGBA (RGB+Alpha)
XYZd (Normal + Depth)
LuRGB (Luminance)
SpRGBe (Specular color and exponent)
likewise, decode paths were also provided to support decoding directly
to BC6H and BC7 compressed textures. each layer would be essentially its
own texture (since the graphics hardware only supports 3 or 4 components
in images).
this was basically so the contents of the video stream could be rendered
in a 3D environment.
a generally effective way of doing HDR is basically just encoding the
float16 data as if it were larger integer components (12 or 16 bits),
though it is worth noting that this requires supporting a larger range
for DCT components and similar (vs the traditional +/-32k), which
effects the VLC coding and storage of intermediate blocks (one can still
optimize for the traditional range, and escape-code occasional larger
values).
for example, one of my codecs (loosely based on JPEG and MPEG-1,
ignoring all my VQ codecs ATM) had used a VLC scheme resembling a
tweaked version of the JPEG scheme:
high 3 bits of symbol: zero count (0-6)
low 5 bits: value prefix, encodes values of +/- 32k using a scheme
similar to Deflate's distance-coding,
sign is folded into the LSB, forming the pattern: 0, -1, 1, -2, 2, ...
conversion to/from twos complement can be done with shifts and xor.
escape coded case:
Z=7 (111)
low-5 bits encode zero-count, encoded similar to Deflate's run-length
count.
next symbol:
00-7F: prefix for larger-range symbol
00-3F: 32-bit range (used)
40-7F: 64-bit range (currently unused)
80-FF: used for special commands, and a few bit-packed vector codings.
the VLC scheme allowed for commands in the AC coeffs, but in my existing
codecs none are used (commands were generally at the level of individual
blocks or macroblocks). DC always used a form similar to that of the
second escape-coded symbol.
partly as a bit of funkiness, the codec had done motion compensation
per-plane, so the Y/U/V/A planes could potentially have different motion
vectors (though, joint vectors may also be used).
typically, in lossy mode, 4:2:0 chroma subsampling was used. the codec
does support a lossless mode though, though lossless coding tends to
result in a rather high bitrate, such as 40-100Mbps or more (it disables
chroma subsampling, and uses the RCT color transform and WHT, instead of
YCbCr and DCT).
I have generally not used arithmetic compression, as it tends to be
slower than is worthwhile.
though, one can get passable results by gluing together a Huffman and
bitwise range-coder, but usually it doesn't seem worthwhile to bet maybe
a 5-15% compression increase at the expense of halving the codec speed.
note that generally, static Huffman was used, with Huffman tables sent
in I-frames, and themselves entropy coded (typically, up to 12 or so
Huffman tables would be used):
YDC, YAC1, YAC2, UVDC, UVAC1, UVAC2, ADC, AAC1, AAC2, pYAC1, pUVAC1,
pAAC1
how they were used was up to the encoder, but there is currently an
implementation limit of 16 Huffman tables per-layer (each layer has its
own set of tables). a combination of MTF and Adaptive-Rice coding was
used to encode the tables (otherwise, it is a similar scheme to that
used in Deflate).
quantization tables were also encoded similarly.
roughly, (lossy) compression seemed to be somewhere between MPEG and XviD.
decoding speed was fairly close to that of XviD (though, mine was a
little slower, both were in the general area around 100Mpix/sec on my PC).
much of the decoding time is dominated by YUV->RGB conversion, and
decoding speed can be a bit faster by decoding to YUV422 or transcoding
to a compressed-texture format.
still-image / I-frame only coding was generally slightly better than
JPEG, and for lossless coding sizes seemed pretty similar to those of
JPEG-XR (both were better than PNG for "natural" images, but worse for
things like screen-shots or line-art).
though, this codec is used less often in my case than some of my VQ
codecs. which mostly have a bit of a speed advantage, like my VQ codecs
can do screen-capture at high resolutions without effectively killing my
PC in the process, and without the crap image quality of MS Video 1
(about the only other "official" codec which seemed to do a passable job
at screen-capture, others either killed the CPU or the HDD).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.xiph.org/pipermail/daala/attachments/20150816/06302eaf/attachment.htm
More information about the daala
mailing list