[Theora-dev] Problems with Theora DirectShow filters
Timothy B. Terriberry
tterribe at vt.edu
Wed Sep 15 19:04:41 PDT 2004
> Most of that went over my head ! I'm pretty much just winging it here :-P
As I said, see Chapter 4 of the spec. It is quite explicit.
> I'll go on a google mission !
> These are the two references i'm using for YUv/RGB wrt directshow.
> Yeah... i've seen this part... the sampling positions of the various DS
> formats are shown on the links above.
> I'll have to look a bit more closely !
The first document contains some good information. However, some nitpicks:
The color spaces should properly be referred to as Y'CbCr, not YUV. When
people discuss YUV, they often invert the meanings of U and V, or use
some other definition of chroma entirely, and it is best to avoid the
confusion. The meaning of Y'CbCr is explicit.
I find introducing the notion of "studio RGB" somewhat confusing, and
unnecessary. One issue that _is_ important, however, is the nominal
range of the Y'CbCr values. The document describes the values for video,
but does not seem to mention that the full range 0..255 is often used
for still images (e.g., in JPEG). Theora only supports the ITU-R BT.601
ranges, Y': 16..234, Cb,Cr: -112..112
The document seems to make the common mistake of assuming there is only
one set of 4:2:2 sampling locations. JFIF (the standard for
encapsulating JPEG in a file) is quite explicit about introducing a
half-pixel horizontal shift in the chroma sampling locations for this
format (I do not believe MPEG1, H.261, or H.263---the other video
standards that use JPEG sampling locations in 4:2:0 mode---support 4:2:2
data). Theora uses JPEG-style 4:2:2, as this makes converting back and
forth between the JPEG-style 4:2:0 sampling more convenient, and the
same operations can be used in the other direction to convert back and
forth between 4:4:4 and 4:2:2. (Theora inherited JPEG-style 4:2:0 from VP3.)
For the upsampling from 4:2:0 to 4:2:2 or 4:2:2 to 4:4:4, the document
claims that introducing the required half-pixel phase shift is more
computationally burdensome than the method they describe, which simply
ignores the problem and claims it "doesn't look that bad". If one is
going to use an anti-aliasing filter to do the upsampling, this is true,
though the increase is not that large. It still requires processing at
least 4 samples, but the filter is no longer symmetric, and must be
applied to both even and odd sample locations, instead of just the odd ones.
But if one is concerned mostly about speed, then in the JPEG-style
sampling case, using a box filter is not that bad. (translation: each
chroma value is simply duplicated). Upsampling for the MPEG2 case can
never be as simple.
If one _is_ going to apply an alias-correcting filter to upsampling, one
can do better than the Catmull-Rom interpolation suggested in that
article. Mitchell and Netravali
(http://portal.acm.org/citation.cfm?id=378514) investigated an entire
family of bicubic interpolating filters:
0 2<=|x| }
The values B and C are parameters. (1,0) corresponds to the traditional
cubic spline. (0,C) is the 1-parameter family of splines which exactly
interpolate their sample locations ("cardinal cubics"), and (0,1/2)
corresponds to the Catmull-Rom spline. They performed tests with human
observers and found a region of the parameter space which provides a
good trade-off between blurring, ringing, and anisotropic artifacts,
centered around the values (1/3,1/3).
This yields the standard Mitchell filter:
This is what is used for upsampling in the experimental encoder_example,
More information about the Theora-dev