<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

  <head>

    <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">

  </head>

  <body text="#000000" bgcolor="#ffffff">

    Le 24/03/2011 22:14, Timothy B. Terriberry a écrit :

    <blockquote cite="mid:4D8BB440.9070805@xiph.org" type="cite">

      <blockquote type="cite">

        <pre wrap="">I have few questions on the specification.

* Just to understand :

- What is the interest of super blocks ? Is it to save place when

recording coded block flags (7.3) ?

</pre>

      </blockquote>

      <pre wrap="">

That is basically their only purpose, though they're also the level at 

which the Hilbert traversal that makes up "coded order" is defined.

</pre>

      <blockquote type="cite">

        <pre wrap="">- What is the advantage of using the coded order ? it is more often

easier with raster order (especially to find neighbor in 7.8.1 for

exemple). Is it to simplify the correspondence between block, macro

block and super block ?

</pre>

      </blockquote>

      <pre wrap="">

The theory was supposed to be something like: every block is an 

immediate neighbor of the preceding block in "coded order", except when 

you reach the end of a super-block row (i.e., the traversal has a high 

fractal dimension). This was actually an idea I'd had before I ever even 

heard of VP3, but I rejected it for being too complex to be worth it 

(although I was thinking of a traversal of the entire image, rather than 

restricting it to four rows at a time... that turns out to be a lot more 

complicated, though, especially once you start trying to handle 

non-square sizes where the number of blocks is not a power of two).

Theory aside, coded order mostly adds a lot of bookkeeping overhead to 

the code, and makes some steps of encoder optimization _really_ _hard_. 

One of my biggest pet peeves with the format is that DC prediction is 

not done in coded order. I doubt it really has a big influence on 

compression performance: even in raster order, everything except the 

last block in a row is an immediate neighbor of its predecessor, though 

it's always a horizontal neighbor. It might be appreciably worse if 

there's strong vertical but no horizontal correlation... though it will 

perform better if there's strong horizontal but no vertical correlation. 

On average I expect it's pretty close to a wash.

On2 dropped the Hilbert curve idea in later VPx formats. We could have 

done so ourselves when moving from VP3 to Theora, but back in the days 

when those decisions were made, trivial lossless VP3-&gt;Theora transcoding 

(basically, just fixing up a few header bits) was seen as a desirable 

feature, both for easy access to content and for IPR safety reasons.

</pre>

    </blockquote>

    <pre class="western" lang="en-US">Thank you for all these explanations.

</pre>

    <blockquote cite="mid:4D8BB440.9070805@xiph.org" type="cite">

      <pre wrap=""></pre>

      <blockquote type="cite">

        <pre wrap="">“Each component can take on integer values from −31 . . . 31, inclusive, at

half-pixel resolution, i.e. −15.5 . . . 15.5 pixels in the luma plane.

For each sub-

sampled axis in the chroma planes, the corresponding motion vector

component

is interpreted as being at quarter-pixel resolution, i.e. −7.75 . . .

7.75 pixels. The

</pre>

      </blockquote>

      <pre wrap="">

I agree the wording here is not very clear.

</pre>

    </blockquote>

    <style type="text/css">pre.cjk { font-family: "DejaVu Sans",monospace; }p { margin-bottom: 0.21cm; }</style>

    <small><span lang="en-US">I think this explanation is clear enough.

        I understand it immediately. Maybe the text can still be

        improved, I suggest :<br>

        <br>

      </span><span lang="en-US">"Each component can take on integer

        values from −31 . . . 31, inclusive. The corresponding motion

        vector component is interpreted as being at half-pixel

        resolution, i.e. −15.5 . . . 15.5 pixels except for each

        sub-sampled axis in the chroma planes, the corresponding motion

        vector component is interpreted as being at quarter-pixel

        resolution, i.e. −7.75 . . . 7.75 pixels."<br>

        <br>

      </span>

      I think the issue is in the algorithm because it does not do what

      it

      is said as the decoded integer value is used directly.<br>

      Maybe it lacks in 7.9.4

      something like this :<br>

      2.(d).(vi). <br>

         B.<br>

          if the x axis of the plane that contains bi is sub-sampled

      then dx = 4.0<br>

          else dx = 2.0<br>

          if the y axis</small><small> of the plane that contains bi </small><small>

      is sub-sampled then dy = 4.0<br>

          else dy = 2.0<br>

      <br>

          C. assign MVX the value<br>

          floor(abs(MVECTS[bi]x /

      dx)) * sign(MVECTS[bi]x)</small>

    <p style="margin-bottom: 0cm;" lang="en-US"><small> </small></p>

    <p style="margin-bottom: 0cm;" lang="en-US"><small>     D. assign

        MVY the value<br>

            floor(abs(MVECTS[bi]y /

        dy)) * sign(MVECTS[bi]y)</small></p>

    <p style="margin-bottom: 0cm;" lang="en-US"><small>    E. assign

        MVX2 the value<br>

            ceil(abs(MVECTS[bi]x /

        dx)) * sign(MVECTS[bi]x)</small></p>

    <small><br>

          F. assign MVY2 the value</small><small><br>

          ceil(abs(MVECTS[bi]y /

      dy)) * sign(MVECTS[bi]y)</small><br>

    (...)<br>

    <style type="text/css">p { margin-bottom: 0.21cm; }</style>

    <p style="margin-bottom: 0cm;" lang="en-US"><small>The confusion

        (for me)

        comes from the difference between the text and the algorithm.

        The

        text alone is understandable (</small><small>Even if it is not

        perfect)</small><small>.</small></p>

    <br>

    <small>Thank you for all the explanations.<br>

      J.F.</small><br>

    <br>

    <br>

  </body>

</html>