<html>
<head>
<meta content="text/html; charset=iso-8859-2"
http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">[OMG!!! Ponies!!!]<br>
<br>
Let's play with numbers!<br>
<ul>
<li>Wikipedia ( <a class="moz-txt-link-freetext" href="https://en.wikipedia.org/wiki/Lux">https://en.wikipedia.org/wiki/Lux</a> ) lists the
brightest illuminance ("Direct sunlight") at 100,000 Lux; the
darkest ("Moonless, overcast night sky (starlight)<a
href="https://en.wikipedia.org/wiki/Lux#cite_note-radfaq-3"><span>")
</span></a>at 0.0001 Lux; the ratio of these is 1E9</li>
<li>log2(1E9) = 29.89, about 30 f-stops/EV/LV values;
incidentally also 30 bits, if we wanted to represent this in
linear light</li>
<li>the "Barten Ramp" (ITU-R Report BT.2246) as shown in the
Dolby paper (linked from the articles below) shows the
"Minimum Contrast Step (%)" to be above 10% for a luminance of
0.001 cd/m^2, with asymptotically approaching a bit below 0.4%
for increasing luminances; in other words 0.4% seems to be the
finest step anywhere on the curve (at least up to 10.000
cd/m^2 - but I don't see any reason why it would get any finer
with even higher amounts of light)<br>
</li>
<li>for this 0.4% precision we need log2(ln(1E9)/ln(1.004)) =
12.34184435 bits</li>
<li>13 bits gives us even lower, 0.2533208% precision
(1E9^(1/2^13) = 1.002532898)</li>
<li>but this is not a very practical unit, let's try to find
something more practical<br>
</li>
<li>0.4% precision is 0.005759269 EV (about 1/173 EV)</li>
<li>with bit shifts and table lookups in mind, let's choose the
nearest power of two, 1/256 EV as our unit<br>
</li>
<li>that brings our appx. 30 EV range of looking into the Sun to
looking at things on Earth under starlight, to 256*log2(1E9)=
7653.722331 steps, which still nicely fits into 13 bits. yay!</li>
<li>(adding just three more bits to arrive at 16 bits would
bring us into whatif.xkcd.com territory - see "detonation of a
hydrogen bomb pressed against your eyeball" at
<a class="moz-txt-link-freetext" href="https://what-if.xkcd.com/73/">https://what-if.xkcd.com/73/</a> )<br>
</li>
</ul>
Executive summary:<br>
<ul>
<li>eliminate the format as bottleneck</li>
<li>declare our intensity step units to be 1/256 EV</li>
<li>anchor values to real-world absolute intensity scale (e.g.
numeric value 0 means 1 Lumen)</li>
<li>13 bits should be enough for starters; but maybe even higher
is not that far either; 16 (but even 14) bits should probably
be more than enough for everyone - if those extra bits are
only a linear scale in complexity, then go for it; we are
routinely using 16 bits of audio everywhere, with way less
than 96dB SNR rooms/headphones, because it is guaranteed to be
good enough, and still cheap and viable with current
technology, and unifies and simplifies things across the
chain; perhaps it is time for the same with video<br>
</li>
<li>optional tone-mapping enhancement layer for contracting of
dynamic range to current levels during decode</li>
</ul>
<p>Details:<br>
</p>
<ul>
<li>this will allow calibrated, physically-based, HDR (for most
intents and purposes, full dynamic range) video</li>
<li>the format dynamic range is sufficient and is well defined,
so the format is no longer the bottleneck due to lack of
precision or lack of definition</li>
<li>for capture/authoring:<br>
</li>
<ul>
<li>capture in real life, even on something like a Sony F65
(14 stops of latitude -
<a class="moz-txt-link-freetext" href="http://pro.sony.com/bbsc/ssr/show-highend/resource.solutions.bbsccms-assets-show-highend-F65.shtml#/f65t1_10">http://pro.sony.com/bbsc/ssr/show-highend/resource.solutions.bbsccms-assets-show-highend-F65.shtml#/f65t1_10</a>)</li>
<li>or for CGI/synthetic footage for composition</li>
<li>with many current camera offerings the format is the limit
- look at proliferation of HDR gammas everywhere as stopgap,
often shoehorned into or simply outright labeled as 709, for
lack of anything better, needing secret handshakes to
reverse the transform at the other end; you can recrord into
RAW, but I have a hunch many productions only do that to
preserve the exposure latitude available to grade for final
look in post, and they'd welcome the option to reduce the
amount of data<br>
</li>
<li>this would eliminate that bottleneck<br>
</li>
</ul>
<li>encourage full dynamic range HDR authoring workflow (even
outside the context of the codec); be the first, catalyze the
entire industry to follow, build momentum; break the vicious
cycle of no innovation due to lack of demand due to lack of
material due to lack of format due to lack of innovation<br>
</li>
<li>for display/presentation/consumption:</li>
<ul>
<li>same issue as on capture side: format is the bottleneck:
displays already capable of higher intensities and wider
dynamic range than current formats offer - but look at how
sluggish uptake for deep color and x.v.Color is, even though
these are only evolutionary steps, not revolutionary;
benefit is not visible/large enough for customers to adopt,
lack of customers stalls innovation<br>
</li>
<li>mutiple approaches possible (considering how displays will
become both brigher and higher dynamic range)<br>
</li>
<ul>
<li>author bakes tone mapping (dynamic range reduction) into
material, to match current dynamic range. present-proof,
but future brighter displays won't benefit, plus less
incentive to innovate due to lack of customer demand due
to lack of content utilizing higher brightness/dynamic
range<br>
</li>
<li>author sends full HDR through, display does full dynamic
range reduction calibrated to its capabilites (max dynamic
range, bit depth, dithering, etc.) - future-proof in
theory, but quality/faithfulness depends on device
implementor, so final looks loses (some of) atristic
control; although provides room for innovation for display
manufacturers<br>
</li>
<li>middle-ground: baseline dynamic range standardized (at
whatever is reasonably current), author sends full dynamic
range HDR through, plus tone mapping enhancement layer to
reduce dynamic range of material to standard display
dynamic range during decode/display. display/user
cooperatively adjust strength of contrast reduction to
taste (e.g. at a minimum on/off, or a gradual scale of
full to none). complete artistic control retained for
current (and future displays) at current dynamic range
levels, but allows graceful enhancement as better displays
come along</li>
<li>maybe incorporate some sort of explicit room light level
modeling (many current TVs already have ambient light
level sensors - let's utilize it in a meaningful way) to
at least semi-automate the matching or perceptual matching<br>
</li>
<ul>
<li>maybe offer black gamma, knee adjustment, etc. usual -
currently in-camera - or similar tone mapping operations
for savvy users<br>
</li>
</ul>
</ul>
</ul>
</ul>
2015.08.16. 12:17 keltezéssel, <a class="moz-txt-link-abbreviated" href="mailto:HuBandiT@gmail.com">HuBandiT@gmail.com</a> írta:<br>
</div>
<blockquote cite="mid:55D06342.7050207@gmail.com" type="cite">
<meta http-equiv="content-type" content="text/html;
charset=iso-8859-2">
HDR and higher bit-depth seem to be coming:<br>
<br>
<a moz-do-not-send="true" class="moz-txt-link-freetext"
href="http://www.dvinfo.net/article/misc/science_n_technology/hpa-tech-retreat-2014-day-4.html">http://www.dvinfo.net/article/misc/science_n_technology/hpa-tech-retreat-2014-day-4.html</a>
section "Better Pixels: Best Bang for the Buck?"<br>
<br>
<a moz-do-not-send="true" class="moz-txt-link-freetext"
href="http://www.dvinfo.net/article/misc/science_n_technology/hpa-tech-retreat-2014-day-1-poynton-watkinson.html">http://www.dvinfo.net/article/misc/science_n_technology/hpa-tech-retreat-2014-day-1-poynton-watkinson.html</a><br>
<ul>
<li>industry seems to use 12-14 bits today, consensus seems to
be at least 12 bits of luma is needed soon even for consumers;
prosumer camcorders (e.g. Sony PXW-X70 - $2000) are doing
10-bit 4:2:2 1080p59.94 today, and anything above $2500-3000
seems to be 12 bit or above<br>
</li>
<li>looks like 13 bits would be sufficient with a simple log
curve, Dolby is proposing 12 bits with their "Perceptual
Quantization" curve</li>
<li>some armchair thinking (just my pebbles to throw into the
thinking pool):</li>
<ul>
<li>log encoding would have the benefit that the traditional
color difference signals derived from log encoded RGB
components would eliminate (white-balanced) intensity
changes (e.g. shadows, fades/dips to-from black) from color
channels</li>
<li>with intensity decoupled from color:</li>
<ul>
<li>considerably lower color precision could be sufficient
(since cosine falloff from object curvature, lens
vignetting, primary light shadows no longer leak into
chroma, no longer forcing it to have comparable precision
to eliminate banding on final result)<br>
</li>
</ul>
<ul>
<li>maybe replace color differences with more preceptual
metrics: some kind of saturation and hue</li>
<ul>
<li>could allow heavier quantization still or even lower
color precision outright (on the assumption that hue and
saturation changes much less in well-lit, well-exposed,
real life scenes)<br>
</li>
<li>think of it like one aspect of reverse Phong shading:
shiny sphere in vacuum under white light only ever has
it's own hue - only intensity and saturation changes
(cosine falloff: towards black, highlight: towards
white; hue channel is quasi constant, flat; real world
will be messier e.g. hue will be pulled away by light
reflected from surrounding objects - but see below on
illumination/object color decomposition)<br>
</li>
</ul>
</ul>
<li>once chroma/color precision is lowered, it might make
sense to go 4:4:4 all the time and just don't bother with
chroma down/upsampling at all</li>
<li>establish the scene/discussion for scene decomposition:
e.g. separately coding albedo (object reflectance) and
illuminance</li>
<ul>
<li>the first step could be a separate
illuminance/intensity/gain channel, that factors (multiply
in linear light = addition in log light) into the final
intensity of the output pixels</li>
<li>encoders unwilling to utilize this can leave this
channel blank at 0dB/0EV/1x<br>
</li>
<li>simplistic encoders could benefit:</li>
<ul>
<li>dips to/from black could preserve full color in main
channels, and only adjust this channel</li>
<li>crossfades could ramp up/down this channel while
referencing main channels at the two frames at both ends
of the crossfade (weighed prediction in linear light
conceptually)<br>
</li>
</ul>
<li>advanced encoders: separately encoding high amplitude
scene illuminance variations from lower ampltude object
reflectance/texture might provide coding gains, especially
in the case of HDR<br>
</li>
<ul>
<li>scene illuminance: higher amplitude, but less details
(mostly "broad strokes" - different statistics than main
channels)<br>
</li>
<li>object reflectance/texture (main channels): smaller
amplitude, but more details</li>
<li>separate prediction/motion compensation for these two<br>
</li>
<li>ideally, scene illuminance should be color as well, to
predict coloured lighting (e.g. illuminance in single
off-white or multiple lightsource cases)</li>
<li>use it as hints for HDR tonemapping tool (see still
photography research)<br>
</li>
</ul>
<li>next step could be to add a highlight layer (kinda like
specular highlights in reverse Phong shading - gradually
blending away the area around the highlight position into
some color of the light source, whether it's transform
coding or some kind of shape based parametric modelling),
there exists machine color vision research in these
directions<br>
</li>
<li>doesn't need to be perfect (it's just prediction after
all) or even cover many cases - just go for some
low-hanging fruits, enough to spark industry
discussion/experimentation<br>
</li>
</ul>
</ul>
</ul>
<br>
<br>
<br>
</blockquote>
<br>
</body>
</html>