[daala] HDR is coming

Sun Aug 16 09:59:22 PDT 2015

[OMG!!! Ponies!!!]

Let's play with numbers!

  * Wikipedia ( https://en.wikipedia.org/wiki/Lux ) lists the brightest
    illuminance ("Direct sunlight") at 100,000 Lux; the darkest
    ("Moonless, overcast night sky (starlight)")
    <https://en.wikipedia.org/wiki/Lux#cite_note-radfaq-3>at 0.0001 Lux;
    the ratio of these is 1E9
  * log2(1E9) = 29.89, about 30 f-stops/EV/LV values; incidentally also
    30 bits, if we wanted to represent this in linear light
  * the "Barten Ramp" (ITU-R Report BT.2246) as shown in the Dolby paper
    (linked from the articles below) shows the "Minimum Contrast Step
    (%)" to be above 10% for a luminance of 0.001 cd/m^2, with
    asymptotically approaching a bit below 0.4% for increasing
    luminances; in other words 0.4% seems to be the finest step anywhere
    on the curve (at least up to 10.000 cd/m^2 - but I don't see any
    reason why it would get any finer with even higher amounts of light)
  * for this 0.4% precision we need log2(ln(1E9)/ln(1.004)) =
    12.34184435 bits
  * 13 bits gives us even lower, 0.2533208% precision (1E9^(1/2^13) =
    1.002532898)
  * but this is not a very practical unit, let's try to find something
    more practical
  * 0.4% precision is 0.005759269 EV (about 1/173 EV)
  * with bit shifts and table lookups in mind, let's choose the nearest
    power of two, 1/256 EV as our unit
  * that brings our appx. 30 EV range of looking into the Sun to looking
    at things on Earth under starlight, to 256*log2(1E9)= 7653.722331
    steps, which still nicely fits into 13 bits. yay!
  * (adding just three more bits to arrive at 16 bits would bring us
    into whatif.xkcd.com territory - see "detonation of a hydrogen bomb
    pressed against your eyeball" at https://what-if.xkcd.com/73/ )

Executive summary:

  * eliminate the format as bottleneck
  * declare our intensity step units to be 1/256 EV
  * anchor values to real-world absolute intensity scale (e.g. numeric
    value 0 means 1 Lumen)
  * 13 bits should be enough for starters; but maybe even higher is not
    that far either; 16 (but even 14) bits should probably be more than
    enough for everyone - if those extra bits are only a linear scale in
    complexity, then go for it; we are routinely using 16 bits of audio
    everywhere, with way less than 96dB SNR rooms/headphones, because it
    is guaranteed to be good enough, and still cheap and viable with
    current technology, and unifies and simplifies things across the
    chain; perhaps it is time for the same with video
  * optional tone-mapping enhancement layer for contracting of dynamic
    range to current levels during decode

Details:

  * this will allow calibrated, physically-based, HDR (for most intents
    and purposes, full dynamic range) video
  * the format dynamic range is sufficient and is well defined, so the
    format is no longer the bottleneck due to lack of precision or lack
    of definition
  * for capture/authoring:
      o capture in real life, even on something like a Sony F65 (14
        stops of latitude -
        http://pro.sony.com/bbsc/ssr/show-highend/resource.solutions.bbsccms-assets-show-highend-F65.shtml#/f65t1_10)
      o or for CGI/synthetic footage for composition
      o with many current camera offerings the format is the limit -
        look at proliferation of HDR gammas everywhere as stopgap, often
        shoehorned into or simply outright labeled as 709, for lack of
        anything better, needing secret handshakes to reverse the
        transform at the other end; you can recrord into RAW, but I have
        a hunch many productions only do that to preserve the exposure
        latitude available to grade for final look in post, and they'd
        welcome the option to reduce the amount of data
      o this would eliminate that bottleneck
  * encourage full dynamic range HDR authoring workflow (even outside
    the context of the codec); be the first, catalyze the entire
    industry to follow, build momentum; break the vicious cycle of no
    innovation due to lack of demand due to lack of material due to lack
    of format due to lack of innovation
  * for display/presentation/consumption:
      o same issue as on capture side: format is the bottleneck:
        displays already capable of higher intensities and wider dynamic
        range than current formats offer - but look at how sluggish
        uptake for deep color and x.v.Color is, even though these are
        only evolutionary steps, not revolutionary; benefit is not
        visible/large enough for customers to adopt, lack of customers
        stalls innovation
      o mutiple approaches possible (considering how displays will
        become both brigher and higher dynamic range)
          + author bakes tone mapping (dynamic range reduction) into
            material, to match current dynamic range. present-proof, but
            future brighter displays won't benefit, plus less incentive
            to innovate due to lack of customer demand due to lack of
            content utilizing higher brightness/dynamic range
          + author sends full HDR through, display does full dynamic
            range reduction calibrated to its capabilites (max dynamic
            range, bit depth, dithering, etc.) - future-proof in theory,
            but quality/faithfulness depends on device implementor, so
            final looks loses (some of) atristic control; although
            provides room for innovation for display manufacturers
          + middle-ground: baseline dynamic range standardized (at
            whatever is reasonably current), author sends full dynamic
            range HDR through, plus tone mapping enhancement layer to
            reduce dynamic range of material to standard display dynamic
            range during decode/display. display/user cooperatively
            adjust strength of contrast reduction to taste (e.g. at a
            minimum on/off, or a gradual scale of full to none).
            complete artistic control retained for current (and future
            displays) at current dynamic range levels, but allows
            graceful enhancement as better displays come along
          + maybe incorporate some sort of explicit room light level
            modeling (many current TVs already have ambient light level
            sensors - let's utilize it in a meaningful way) to at least
            semi-automate the matching or perceptual matching
              # maybe offer black gamma, knee adjustment, etc. usual -
                currently in-camera - or similar tone mapping operations
                for savvy users

2015.08.16. 12:17 keltezéssel, HuBandiT at gmail.com írta:
> HDR and higher bit-depth seem to be coming:
>
> http://www.dvinfo.net/article/misc/science_n_technology/hpa-tech-retreat-2014-day-4.html 
> section "Better Pixels: Best Bang for the Buck?"
>
> http://www.dvinfo.net/article/misc/science_n_technology/hpa-tech-retreat-2014-day-1-poynton-watkinson.html
>
>   * industry seems to use 12-14 bits today, consensus seems to be at
>     least 12 bits of luma is needed soon even for consumers; prosumer
>     camcorders (e.g. Sony PXW-X70 - $2000) are doing 10-bit 4:2:2
>     1080p59.94 today, and anything above $2500-3000 seems to be 12 bit
>     or above
>   * looks like 13 bits would be sufficient with a simple log curve,
>     Dolby is proposing 12 bits with their "Perceptual Quantization" curve
>   * some armchair thinking (just my pebbles to throw into the thinking
>     pool):
>       o log encoding would have the benefit that the traditional color
>         difference signals derived from log encoded RGB components
>         would eliminate (white-balanced) intensity changes (e.g.
>         shadows, fades/dips to-from black) from color channels
>       o with intensity decoupled from color:
>           + considerably lower color precision could be sufficient
>             (since cosine falloff from object curvature, lens
>             vignetting, primary light shadows no longer leak into
>             chroma, no longer forcing it to have comparable precision
>             to eliminate banding on final result)
>           + maybe replace color differences with more preceptual
>             metrics: some kind of saturation and hue
>               # could allow heavier quantization still or even lower
>                 color precision outright (on the assumption that hue
>                 and saturation changes much less in well-lit,
>                 well-exposed, real life scenes)
>               # think of it like one aspect of reverse Phong shading:
>                 shiny sphere in vacuum under white light only ever has
>                 it's own hue - only intensity and saturation changes
>                 (cosine falloff: towards black, highlight: towards
>                 white; hue channel is quasi constant, flat; real world
>                 will be messier e.g. hue will be pulled away by light
>                 reflected from surrounding objects - but see below on
>                 illumination/object color decomposition)
>       o once chroma/color precision is lowered, it might make sense to
>         go 4:4:4 all the time and just don't bother with chroma
>         down/upsampling at all
>       o establish the scene/discussion for scene decomposition: e.g.
>         separately coding albedo (object reflectance) and illuminance
>           + the first step could be a separate
>             illuminance/intensity/gain channel, that factors (multiply
>             in linear light = addition in log light) into the final
>             intensity of the output pixels
>           + encoders unwilling to utilize this can leave this channel
>             blank at 0dB/0EV/1x
>           + simplistic encoders could benefit:
>               # dips to/from black could preserve full color in main
>                 channels, and only adjust this channel
>               # crossfades could ramp up/down this channel while
>                 referencing main channels at the two frames at both
>                 ends of the crossfade (weighed prediction in linear
>                 light conceptually)
>           + advanced encoders: separately encoding high amplitude
>             scene illuminance variations from lower ampltude object
>             reflectance/texture might provide coding gains, especially
>             in the case of HDR
>               # scene illuminance: higher amplitude, but less details
>                 (mostly "broad strokes" - different statistics than
>                 main channels)
>               # object reflectance/texture (main channels): smaller
>                 amplitude, but more details
>               # separate prediction/motion compensation for these two
>               # ideally, scene illuminance should be color as well, to
>                 predict coloured lighting (e.g. illuminance in single
>                 off-white or multiple lightsource cases)
>               # use it as hints for HDR tonemapping tool (see still
>                 photography research)
>           + next step could be to add a highlight layer (kinda like
>             specular highlights in reverse Phong shading - gradually
>             blending away the area around the highlight position into
>             some color of the light source, whether it's transform
>             coding or some kind of shape based parametric modelling),
>             there exists machine color vision research in these directions
>           + doesn't need to be perfect (it's just prediction after
>             all) or even cover many cases - just go for some
>             low-hanging fruits, enough to spark industry
>             discussion/experimentation
>
>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.xiph.org/pipermail/daala/attachments/20150816/ff718438/attachment-0001.htm