[Theora-dev] Re: [ogg-dev] OggYUV

Tue Nov 8 09:04:43 PST 2005

On Tue, Nov 08, 2005 at 09:36:52PM +0800, illiminable wrote:
> 
> Then there's YUY2 which is interleaved Y0 U0 Y1 V0 Y2 U1 Y3 V1, and YVYU 
> (Y0 V0 Y1 U0 Y2 V1 Y3 U1), and UYVY (U0 Y0 V0 Y1 U0 Y2 V0 Y3)... and then 
> there's AYUV, which has a 4th alpha channel.

We will only be doing [A]YUV ordered planar encoding, no other order, not 
packed using one of several methods.  You're right, there's simply too many 
different possibilities, and the software implementation is too complex.

> Then there's the issue of where the samples lie on a grid in relation to 
> the pixels centre, do the samples centre over the pixels in the horizontal 
> or vertical direction, or do they fall at the mid point between 2 pixel 
> centres.

Yes, that needs to be noted, too.  I've seen common implementations which do 
both, and the subsampled chroma -> RGB mapping is very different between the 
different methods.

> And then there's the colour spaces (which i don't know all the details of, 
> but i'm sure derf or rillian can tell you all about it).

That's one of the fields we're currently lacking on the wiki, and one which I 
don't understand either.

> If you have a bits per channel field in RGB, what about RGB24, 3 channels, 
> 8 bits each, but padded into 32 bits. RGB555, 15 bits, padded to 16.

Or RGB 565, giving green an extra bit because the human eye can see twice as 
many shades of green than red or blue.. but yes, RGB doubles the issue, which is 
why we need a seperate codec for it.

> There's thousands of invalid possibilities, and only 15-20 or less valid 
> ones... only really 3-5 commonly used.
> 
> If someone wants to go crazy and design a franken-yuv format for some 
> bizarre reason, then they can easily make another stream format... but you 
> can pretty much count the ones people actually care about and that are used 
> in 90% of cases on one hand, YV12 (4:2:0), YUY2(4:2:2),  RGB24, ARGB and 
> maybe RGB555.

4:4:4, 4:1:1, RGB32, 16-bit per channel, many other common ones, especially for 
those used for professional video.

This is primarily an interchange format, something that the Theora codec can 
output for the media player to receive, or the webcam can send to Theora to 
encode, or raw video to be stored in such that it can be encoded to a new codec 
in testing while reliably keeping a/v sync.

Media players don't have to support every format, nor does any video codec.  If 
a video codec (ie, DV) can only output 4:1:1 and the media player only takes 
4:4:4 then an intermediary plugin will be needed to do the convertion.  

Some media frameworks already have functions for these, so they'll just take 
whatever format is being outputted and do the convertion themselves before 
sending to the media player.

So what I propose for OggYUV is to cover the capabilities of Ogg video codecs, 
everything Theora is capable of and prehaps a bit more that we've seen from 
other codecs.  4:4:4, as I recall, is supported by the Theora spec (even if the 
current implementation doesn't).  

> Also, on another issue, i already find the method of codec identification 
> pretty ad hoc... i think having ident fields that are only 3 or 4 bytes is 
> a very bad idea.

Talk to Monty about this, it's part of the design for Ogg.  It's what we've done 
to date, and as long as you're working in a strategy where codecs are asked if 
they support something, or provide some information similar to mime magic, it 
works fine.

If you're suggesting that OggPCM and OggYUV use "RawPCM" and "RawYUV", or 
something similar for an identifier, to allow future codecs to begin with PCM* 
or YUV*, that makes some sense, but I currently feel that the three letters are 
sufficient and allow 3rd party codecs to use a prefix if PCM/YUV/RGB is in their 
name.

It's become a pseduo-standard that the first byte of page0 be a header ID byte, 
and the variable length codec identification magic follows.  In the OggStream 
code that I'm working on, 8 bytes are used to identify a codec from the plugin 
to the application, with the first 7 of those useable such that the null-padding 
will null-terminate the string.

The web search API, to find the name and plugin for an unknown codec, sends the 
entire contents of packet 0 to the application via HTTP.

-- 

The recognition of individual possibility,
 to allow each to be what she and he can be,
  rests inherently upon the availability of knowledge;
 The perpetuation of ignorance is the beginning of slavery.

from "Die Gedanken Sind Frei": Free Software and the Struggle for Free Thought
 by Eben Moglen, General council of the Free Software Foundation