[Theora-dev] Problems with Theora DirectShow filters

Wed Sep 15 19:04:41 PDT 2004

illiminable wrote:
> Most of that went over my head ! I'm pretty much just winging it here :-P

As I said, see Chapter 4 of the spec. It is quite explicit.

http://v2v.cc/~j/Theora_I_spec.pdf

> I'll go on a google mission !
> 
> These are the two references i'm using for YUv/RGB wrt directshow.
> 
> http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnwmt/html/YUVFormats.asp 
> 
> 
> http://msdn.microsoft.com/library/default.asp?url=/library/en-us/directshow/htm/uncompressedrgbvideosubtypes.asp 

> Yeah... i've seen this part... the sampling positions of the various DS 
> formats are shown on the links above.
> 
> I'll have to look a bit more closely !

The first document contains some good information. However, some nitpicks:

The color spaces should properly be referred to as Y'CbCr, not YUV. When 
people discuss YUV, they often invert the meanings of U and V, or use 
some other definition of chroma entirely, and it is best to avoid the 
confusion. The meaning of Y'CbCr is explicit.

I find introducing the notion of "studio RGB" somewhat confusing, and 
unnecessary. One issue that _is_ important, however, is the nominal 
range of the Y'CbCr values. The document describes the values for video, 
but does not seem to mention that the full range 0..255 is often used 
for still images (e.g., in JPEG). Theora only supports the ITU-R BT.601 
ranges, Y': 16..234, Cb,Cr: -112..112

The document seems to make the common mistake of assuming there is only 
one set of 4:2:2 sampling locations. JFIF (the standard for 
encapsulating JPEG in a file) is quite explicit about introducing a 
half-pixel horizontal shift in the chroma sampling locations for this 
format (I do not believe MPEG1, H.261, or H.263---the other video 
standards that use JPEG sampling locations in 4:2:0 mode---support 4:2:2 
data). Theora uses JPEG-style 4:2:2, as this makes converting back and 
forth between the JPEG-style 4:2:0 sampling more convenient, and the 
same operations can be used in the other direction to convert back and 
forth between 4:4:4 and 4:2:2. (Theora inherited JPEG-style 4:2:0 from VP3.)

For the upsampling from 4:2:0 to 4:2:2 or 4:2:2 to 4:4:4, the document 
claims that introducing the required half-pixel phase shift is more 
computationally burdensome than the method they describe, which simply 
ignores the problem and claims it "doesn't look that bad". If one is 
going to use an anti-aliasing filter to do the upsampling, this is true, 
though the increase is not that large. It still requires processing at 
least 4 samples, but the filter is no longer symmetric, and must be 
applied to both even and odd sample locations, instead of just the odd ones.

But if one is concerned mostly about speed, then in the JPEG-style 
sampling case, using a box filter is not that bad. (translation: each 
chroma value is simply duplicated). Upsampling for the MPEG2 case can 
never be as simple.

If one _is_ going to apply an alias-correcting filter to upsampling, one 
can do better than the Catmull-Rom interpolation suggested in that 
article. Mitchell and Netravali 
(http://portal.acm.org/citation.cfm?id=378514) investigated an entire 
family of bicubic interpolating filters:

k(x)=(1/6)*{(12-9B-6C)|x|^3+(-18+12B-6C)|x|^2+(6-2B),             |x|<1,
             (-B-6C)|x|^3+(6B+30C)|x|^2+(-12B-48C)|x|-(8B+24C), 1<=|x|<2,
             0                                                  2<=|x| }

The values B and C are parameters. (1,0) corresponds to the traditional 
cubic spline. (0,C) is the 1-parameter family of splines which exactly 
interpolate their sample locations ("cardinal cubics"), and (0,1/2) 
corresponds to the Catmull-Rom spline. They performed tests with human 
observers and found a region of the parameter space which provides a 
good trade-off between blurring, ringing, and anisotropic artifacts, 
centered around the values (1/3,1/3).

This yields the standard Mitchell filter:
k(x)=(1/6)*{7|x|^3-12|x|^2+16/3,               |x|<1
             -(7/3)|x|^3+12|x|^2-20|x|+32/3, 1<=|x|<2
             0                               2<=|x|}

This is what is used for upsampling in the experimental encoder_example, 
at 
http://svn.xiph.org/experimental/derf/theora-exp/examples/encoder_example.c