[Theora-dev] Problems with Theora DirectShow filters

Wed Sep 15 23:20:00 PDT 2004

Thanks for the information... looks like i need to do a lot more reading ! 
:)

Cheers,

Zen.

----- Original Message ----- 
From: "Timothy B. Terriberry" <tterribe at vt.edu>
To: "illiminable" <ogg at illiminable.com>
Cc: <theora-dev at xiph.org>
Sent: Thursday, September 16, 2004 10:04 AM
Subject: Re: [Theora-dev] Problems with Theora DirectShow filters

> illiminable wrote:
>> Most of that went over my head ! I'm pretty much just winging it here :-P
>
> As I said, see Chapter 4 of the spec. It is quite explicit.
>
> http://v2v.cc/~j/Theora_I_spec.pdf
>
>> I'll go on a google mission !
>>
>> These are the two references i'm using for YUv/RGB wrt directshow.
>>
>> http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnwmt/html/YUVFormats.asp 
>> http://msdn.microsoft.com/library/default.asp?url=/library/en-us/directshow/htm/uncompressedrgbvideosubtypes.asp
>
>
>> Yeah... i've seen this part... the sampling positions of the various DS 
>> formats are shown on the links above.
>>
>> I'll have to look a bit more closely !
>
> The first document contains some good information. However, some nitpicks:
>
> The color spaces should properly be referred to as Y'CbCr, not YUV. When 
> people discuss YUV, they often invert the meanings of U and V, or use some 
> other definition of chroma entirely, and it is best to avoid the 
> confusion. The meaning of Y'CbCr is explicit.
>
> I find introducing the notion of "studio RGB" somewhat confusing, and 
> unnecessary. One issue that _is_ important, however, is the nominal range 
> of the Y'CbCr values. The document describes the values for video, but 
> does not seem to mention that the full range 0..255 is often used for 
> still images (e.g., in JPEG). Theora only supports the ITU-R BT.601 
> ranges, Y': 16..234, Cb,Cr: -112..112
>
> The document seems to make the common mistake of assuming there is only 
> one set of 4:2:2 sampling locations. JFIF (the standard for encapsulating 
> JPEG in a file) is quite explicit about introducing a half-pixel 
> horizontal shift in the chroma sampling locations for this format (I do 
> not believe MPEG1, H.261, or H.263---the other video standards that use 
> JPEG sampling locations in 4:2:0 mode---support 4:2:2 data). Theora uses 
> JPEG-style 4:2:2, as this makes converting back and forth between the 
> JPEG-style 4:2:0 sampling more convenient, and the same operations can be 
> used in the other direction to convert back and forth between 4:4:4 and 
> 4:2:2. (Theora inherited JPEG-style 4:2:0 from VP3.)
>
> For the upsampling from 4:2:0 to 4:2:2 or 4:2:2 to 4:4:4, the document 
> claims that introducing the required half-pixel phase shift is more 
> computationally burdensome than the method they describe, which simply 
> ignores the problem and claims it "doesn't look that bad". If one is going 
> to use an anti-aliasing filter to do the upsampling, this is true, though 
> the increase is not that large. It still requires processing at least 4 
> samples, but the filter is no longer symmetric, and must be applied to 
> both even and odd sample locations, instead of just the odd ones.
>
> But if one is concerned mostly about speed, then in the JPEG-style 
> sampling case, using a box filter is not that bad. (translation: each 
> chroma value is simply duplicated). Upsampling for the MPEG2 case can 
> never be as simple.
>
> If one _is_ going to apply an alias-correcting filter to upsampling, one 
> can do better than the Catmull-Rom interpolation suggested in that 
> article. Mitchell and Netravali 
> (http://portal.acm.org/citation.cfm?id=378514) investigated an entire 
> family of bicubic interpolating filters:
>
> k(x)=(1/6)*{(12-9B-6C)|x|^3+(-18+12B-6C)|x|^2+(6-2B),             |x|<1,
>             (-B-6C)|x|^3+(6B+30C)|x|^2+(-12B-48C)|x|-(8B+24C), 1<=|x|<2,
>             0                                                  2<=|x| }
>
> The values B and C are parameters. (1,0) corresponds to the traditional 
> cubic spline. (0,C) is the 1-parameter family of splines which exactly 
> interpolate their sample locations ("cardinal cubics"), and (0,1/2) 
> corresponds to the Catmull-Rom spline. They performed tests with human 
> observers and found a region of the parameter space which provides a good 
> trade-off between blurring, ringing, and anisotropic artifacts, centered 
> around the values (1/3,1/3).
>
> This yields the standard Mitchell filter:
> k(x)=(1/6)*{7|x|^3-12|x|^2+16/3,               |x|<1
>             -(7/3)|x|^3+12|x|^2-20|x|+32/3, 1<=|x|<2
>             0                               2<=|x|}
>
> This is what is used for upsampling in the experimental encoder_example, 
> at 
> http://svn.xiph.org/experimental/derf/theora-exp/examples/encoder_example.c
>
>