[ogg-dev] OggYUV
John Koleszar
jkoleszar at on2.com
Wed Nov 9 06:13:03 PST 2005
Arc wrote:
>I disagree with this, most decoders using OggStream are unlikely to be using
>FourCC, or at least the ones I care most about, and this places a complexity
>
I don't think this is true. Most data sources are going to have some
fourcc associated with them. If it's a piece of hardware, it's going to
work on windows, and there will be a fourcc to describe it's data
format, assuming it's a relatively open piece of hardware. And since the
more open hardware is the most likely to have linux support, you're
likely to be getting data described by a common fourcc. On the other
end, if you're working with a mplayer that supports many codecs (eg,
mplayer), it's going to understand many of the standard fourcc's already.
>burden on all implementations which use OggYUV such that they *MUST* have a
>table of FourCC -> format mappings, whereas software which already supports
>
>
Not true. I'm proposing tagging the data with the fourcc if you know
what it is. You can leave it blank and let the extra data fields
describe it if you don't know or don't want to use the fourcc. If you do
know the fourcc, you fill in the fourcc field AND the data layout
fields. Then applications that don't know anything about the fourcc can
stil work on the data, and applications that do understand fourcc have a
much easier time dealing with it. Otherwise, an application has to
inspect the 30 some-odd parameters I identified earlier to see if it's a
stream it already understands.
To use a car analogy, it's like trying to sell a car by listing 30
parameters including cylinders, displacement, wheelbase, wheel diameter,
number of wheels, number of doors, height, headroom, legroom, etc, but
not the model name, when the guy buying it (eg, a player application)
only cares whether it's a pickup truck or not. Yes, you can look at all
the parameters and figure out that it's a truck being described, but
it's a lot of work when you only want to park it in your garage (copy it
to video memory, for instance). By listing the model along with all the
data, you support the manufacturer (data importer) who knows that the
vehicle's a truck but lists its parameters anyway, the mechanics
(plugins) who don't care what model of vehicle it is but need all its
parameters, the guy who parks it in his garage (player). You also
support the hobby builder who doesn't have an official model name. He
can build a car that everyone can work on without actually naming it.
>hese mappings and be able to quickly see
>if a OggYUV stream is directly mappable to a raw YUV FourCC codec.
>
>
Robust applications that support only the common formats will have to
parse untagged headers to determine if the format is really supported or
not, but friendly applications that know they are outputting data in a
standard format should tag the data as being standard.
>Also, as you pointed out, many FourCC implementations are ambiguously defined
>and are thus inadequate on their own.
>
Yes, many of them are ambiguous and not well understood. However, the
ones that are widely used (YV12, I420, YUY2, UYVY, YVYU are the ones I
use most) ARE well understood, and images formatted in that way will be
common payloads somewhere in the OggStream chain between the original
data source and the video card.
>>Displayed Width&Height
>>Stored Width&Height
>>Aspect Ratio (Fractional)
>>
>>
>
>Aspect ratio is what makes pixels potentially non-square, and since we're not
>encoding in blocks as most compressed codecs do, what purpose would having a
>different displayed/stored width/height serve?
>
>
Many of the YUV formats only work on image sizes that are a multiple of
some common number (2, 4). YUV 4:2:0 formats can only store images with
an even number of pixels in both directions. If you have an odd sized
image, you can leave the border pixels undefined or extend them, but you
need to specify that only w-1 pixels contain valid data.
>This isn't what colorspace means, from what I've seen at least.. Theora
>implements ITU 601 and CIE 709 colorspaces, which apparently tell the decoder or
>converter how to properly map YUV values to RGB. It's not YUV vs RGB, but
>rather one of those fields unique to YUV video.
>
>Correct me if I'm wrong, or if "Colorspace" is ambiguous.
>
>
I'm not a color expert. But as far as I can tell, color is described by
a triple. (RGB is linear, R'G'B' is nonlinear, ITU 601 and CIE 709 are
others) The link Timothy sent yesterday is good. I'm trying to grok it
now. In any case, this field is an enumeration, and we just need to
identify the proper values. I think we're basically in agreement here.
>>// Subsampling data
>>U Channel X Sample Rate (Fractional)
>>U Channel Y Sample Rate (Fractional)
>>U Channel X Sample Offset (Fractional)
>>U Channel Y Sample Offset (Fractional)
>>V Channel X Sample Rate (Fractional)
>>V Channel Y Sample Rate (Fractional)
>>V Channel X Sample Offset (Fractional)
>>V Channel Y Sample Offset (Fractional)
>>
>>
>
>I'm unsure what you're trying to do here. Implement 4:4:4 vs 4:2:2 vs 4:2:0?
>What is the offset, why is it fractional?
>
>
Yes. The sample rate tells you 4:4:4 vs 4:2:2 vs 4:2:0 vs 4:1:1 etc.
These don't have to be a lot of bits. The offset tells you where the
sample was taken, since some chroma samples are taken at the same place
as the luma, and others are taken half way inbetween, and various
combinations thereof. I'd guess 2 bits for each of these is probably
sufficient (if you stick to a four pixel macropixel).
>All the common formats (ignoring some of the older FourCC which are rarely used)
>implement a two-line system with no more than four pixels in each "block".
>Thus, we can implement this very simply. Y-U-V is always provided in that
>order (wether planar or packed), so what we must encode is on which luma pixels
>chroma data is provided for.
>
>
YUV is absolutely NOT always in that order. YV12 (likely theora's
storage format unless it diverged from VP3) actually stores the V plane
before the U plane in memory. YVYU is a fairly common packed format that
stores V first.
>>// Storage data
>>A Channel Bits Per Sample
>>A Channel Field 0 Offset (in bits)
>>A Channel Field 1 Offset (in bits)
>>A Channel X Stride (in bits)
>>A Channel Y Stride (in bits?)
>>Y/R Channel Bits Per Sample
>>Y/R Channel Field 0 Offset (in bits)
>>Y/R Channel Field 1 Offset (in bits)
>>Y/R Channel X Stride (in bits)
>>Y/R Channel Y Stride (in bits?)
>>U/G Channel Bits Per Sample
>>U/G Channel Field 0 Offset (in bits)
>>U/G Channel Field 1 Offset (in bits)
>>U/G Channel X Stride (in bits)
>>U/G Channel Y Stride (in bits?)
>>V/B Channel Bits Per Sample
>>V/B Channel Field 0 Offset (in bits)
>>V/B Channel Field 1 Offset (in bits)
>>V/B Channel X Stride (in bits)
>>V/B Channel Y Stride (in bits?)
>>
>>
>
>I'm unsure what any of this is or why it's nessesary. Please explain?
>
>
I think this is what's necessary to fully describe an arbitrary four
channel buffer of an optionally interlaced image (as long as it only has
two fields). You actually need data like this in the OggRGB format.
Right now, there isn't enough to tell the field order, so you'd have to
mandate something. There are different field orderings, eg BGRA, ABGR,
RGBA, ARGB, RGB, BGR, etc. To be pedantic, I think there are different
RGB colorspaces, though I don't think they're generally used in computer
video. You need signed Y stride since there are RGB formats that aren't
tightly packed (eg rows aligned to a four byte boundary). Also, some
images are stored top down, others bottom up.. Having an offset and
stride handles this well, because it's how it's done in software
(pointer + stride). You need separate values for each channel, since
each channel can be stored in any order. X stride is needed because in
the packed formats, the stride between luma samples can be different
from the stride between chroma samples.
>Compare http://wiki.xiph.org/OggRGB to http://wiki.xiph.org/OggYUV - yes, they
>are similar, but YUV is much more complex, and I see no reason to join them.
>Or, if you prefer, think of them as one codec with two identifiers which change
>the fields around in the header/etc.
>
>
A fully defined RGB image as just as complex as a YUV one, except for
the subsampling.
>Load up the current draft at http://wiki.xiph.org/OggYUV
>
>
I'll take a look at it, but I'm not ready to talk bits when the fields
are still up in the air.
More information about the ogg-dev
mailing list