# [daala] HDR is coming

HuBandiT@gmail.com hubandit at gmail.com
Sun Aug 16 09:59:22 PDT 2015

```[OMG!!! Ponies!!!]

Let's play with numbers!

* Wikipedia ( https://en.wikipedia.org/wiki/Lux ) lists the brightest
illuminance ("Direct sunlight") at 100,000 Lux; the darkest
("Moonless, overcast night sky (starlight)")
the ratio of these is 1E9
* log2(1E9) = 29.89, about 30 f-stops/EV/LV values; incidentally also
30 bits, if we wanted to represent this in linear light
* the "Barten Ramp" (ITU-R Report BT.2246) as shown in the Dolby paper
(linked from the articles below) shows the "Minimum Contrast Step
(%)" to be above 10% for a luminance of 0.001 cd/m^2, with
asymptotically approaching a bit below 0.4% for increasing
luminances; in other words 0.4% seems to be the finest step anywhere
on the curve (at least up to 10.000 cd/m^2 - but I don't see any
reason why it would get any finer with even higher amounts of light)
* for this 0.4% precision we need log2(ln(1E9)/ln(1.004)) =
12.34184435 bits
* 13 bits gives us even lower, 0.2533208% precision (1E9^(1/2^13) =
1.002532898)
* but this is not a very practical unit, let's try to find something
more practical
* 0.4% precision is 0.005759269 EV (about 1/173 EV)
* with bit shifts and table lookups in mind, let's choose the nearest
power of two, 1/256 EV as our unit
* that brings our appx. 30 EV range of looking into the Sun to looking
at things on Earth under starlight, to 256*log2(1E9)= 7653.722331
steps, which still nicely fits into 13 bits. yay!
* (adding just three more bits to arrive at 16 bits would bring us
into whatif.xkcd.com territory - see "detonation of a hydrogen bomb
pressed against your eyeball" at https://what-if.xkcd.com/73/ )

Executive summary:

* eliminate the format as bottleneck
* declare our intensity step units to be 1/256 EV
* anchor values to real-world absolute intensity scale (e.g. numeric
value 0 means 1 Lumen)
* 13 bits should be enough for starters; but maybe even higher is not
that far either; 16 (but even 14) bits should probably be more than
enough for everyone - if those extra bits are only a linear scale in
complexity, then go for it; we are routinely using 16 bits of audio
everywhere, with way less than 96dB SNR rooms/headphones, because it
is guaranteed to be good enough, and still cheap and viable with
current technology, and unifies and simplifies things across the
chain; perhaps it is time for the same with video
* optional tone-mapping enhancement layer for contracting of dynamic
range to current levels during decode

Details:

* this will allow calibrated, physically-based, HDR (for most intents
and purposes, full dynamic range) video
* the format dynamic range is sufficient and is well defined, so the
format is no longer the bottleneck due to lack of precision or lack
of definition
* for capture/authoring:
o capture in real life, even on something like a Sony F65 (14
stops of latitude -
http://pro.sony.com/bbsc/ssr/show-highend/resource.solutions.bbsccms-assets-show-highend-F65.shtml#/f65t1_10)
o or for CGI/synthetic footage for composition
o with many current camera offerings the format is the limit -
look at proliferation of HDR gammas everywhere as stopgap, often
shoehorned into or simply outright labeled as 709, for lack of
anything better, needing secret handshakes to reverse the
transform at the other end; you can recrord into RAW, but I have
a hunch many productions only do that to preserve the exposure
latitude available to grade for final look in post, and they'd
welcome the option to reduce the amount of data
o this would eliminate that bottleneck
* encourage full dynamic range HDR authoring workflow (even outside
the context of the codec); be the first, catalyze the entire
industry to follow, build momentum; break the vicious cycle of no
innovation due to lack of demand due to lack of material due to lack
of format due to lack of innovation
* for display/presentation/consumption:
o same issue as on capture side: format is the bottleneck:
displays already capable of higher intensities and wider dynamic
range than current formats offer - but look at how sluggish
uptake for deep color and x.v.Color is, even though these are
only evolutionary steps, not revolutionary; benefit is not
visible/large enough for customers to adopt, lack of customers
stalls innovation
o mutiple approaches possible (considering how displays will
become both brigher and higher dynamic range)
+ author bakes tone mapping (dynamic range reduction) into
material, to match current dynamic range. present-proof, but
future brighter displays won't benefit, plus less incentive
to innovate due to lack of customer demand due to lack of
content utilizing higher brightness/dynamic range
+ author sends full HDR through, display does full dynamic
range reduction calibrated to its capabilites (max dynamic
range, bit depth, dithering, etc.) - future-proof in theory,
but quality/faithfulness depends on device implementor, so
final looks loses (some of) atristic control; although
provides room for innovation for display manufacturers
+ middle-ground: baseline dynamic range standardized (at
whatever is reasonably current), author sends full dynamic
range HDR through, plus tone mapping enhancement layer to
reduce dynamic range of material to standard display dynamic
range during decode/display. display/user cooperatively
adjust strength of contrast reduction to taste (e.g. at a
minimum on/off, or a gradual scale of full to none).
complete artistic control retained for current (and future
displays) at current dynamic range levels, but allows
graceful enhancement as better displays come along
+ maybe incorporate some sort of explicit room light level
modeling (many current TVs already have ambient light level
sensors - let's utilize it in a meaningful way) to at least
semi-automate the matching or perceptual matching
# maybe offer black gamma, knee adjustment, etc. usual -
currently in-camera - or similar tone mapping operations
for savvy users

2015.08.16. 12:17 keltezéssel, HuBandiT at gmail.com írta:
> HDR and higher bit-depth seem to be coming:
>
> http://www.dvinfo.net/article/misc/science_n_technology/hpa-tech-retreat-2014-day-4.html
> section "Better Pixels: Best Bang for the Buck?"
>
> http://www.dvinfo.net/article/misc/science_n_technology/hpa-tech-retreat-2014-day-1-poynton-watkinson.html
>
>   * industry seems to use 12-14 bits today, consensus seems to be at
>     least 12 bits of luma is needed soon even for consumers; prosumer
>     camcorders (e.g. Sony PXW-X70 - \$2000) are doing 10-bit 4:2:2
>     1080p59.94 today, and anything above \$2500-3000 seems to be 12 bit
>     or above
>   * looks like 13 bits would be sufficient with a simple log curve,
>     Dolby is proposing 12 bits with their "Perceptual Quantization" curve
>   * some armchair thinking (just my pebbles to throw into the thinking
>     pool):
>       o log encoding would have the benefit that the traditional color
>         difference signals derived from log encoded RGB components
>         would eliminate (white-balanced) intensity changes (e.g.
>       o with intensity decoupled from color:
>           + considerably lower color precision could be sufficient
>             (since cosine falloff from object curvature, lens
>             vignetting, primary light shadows no longer leak into
>             chroma, no longer forcing it to have comparable precision
>             to eliminate banding on final result)
>           + maybe replace color differences with more preceptual
>             metrics: some kind of saturation and hue
>               # could allow heavier quantization still or even lower
>                 color precision outright (on the assumption that hue
>                 and saturation changes much less in well-lit,
>                 well-exposed, real life scenes)
>               # think of it like one aspect of reverse Phong shading:
>                 shiny sphere in vacuum under white light only ever has
>                 it's own hue - only intensity and saturation changes
>                 (cosine falloff: towards black, highlight: towards
>                 white; hue channel is quasi constant, flat; real world
>                 will be messier e.g. hue will be pulled away by light
>                 reflected from surrounding objects - but see below on
>                 illumination/object color decomposition)
>       o once chroma/color precision is lowered, it might make sense to
>         go 4:4:4 all the time and just don't bother with chroma
>         down/upsampling at all
>       o establish the scene/discussion for scene decomposition: e.g.
>         separately coding albedo (object reflectance) and illuminance
>           + the first step could be a separate
>             illuminance/intensity/gain channel, that factors (multiply
>             in linear light = addition in log light) into the final
>             intensity of the output pixels
>           + encoders unwilling to utilize this can leave this channel
>             blank at 0dB/0EV/1x
>           + simplistic encoders could benefit:
>               # dips to/from black could preserve full color in main
>                 channels, and only adjust this channel
>               # crossfades could ramp up/down this channel while
>                 referencing main channels at the two frames at both
>                 ends of the crossfade (weighed prediction in linear
>                 light conceptually)
>           + advanced encoders: separately encoding high amplitude
>             scene illuminance variations from lower ampltude object
>             reflectance/texture might provide coding gains, especially
>             in the case of HDR
>               # scene illuminance: higher amplitude, but less details
>                 (mostly "broad strokes" - different statistics than
>                 main channels)
>               # object reflectance/texture (main channels): smaller
>                 amplitude, but more details
>               # separate prediction/motion compensation for these two
>               # ideally, scene illuminance should be color as well, to
>                 predict coloured lighting (e.g. illuminance in single
>                 off-white or multiple lightsource cases)
>               # use it as hints for HDR tonemapping tool (see still
>                 photography research)
>           + next step could be to add a highlight layer (kinda like
>             blending away the area around the highlight position into
>             some color of the light source, whether it's transform
>             coding or some kind of shape based parametric modelling),
>             there exists machine color vision research in these directions
>           + doesn't need to be perfect (it's just prediction after
>             all) or even cover many cases - just go for some
>             low-hanging fruits, enough to spark industry
>             discussion/experimentation
>
>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.xiph.org/pipermail/daala/attachments/20150816/ff718438/attachment-0001.htm
```