[Theora-dev] & [ogg-dev] OggYUV

Tue Nov 8 10:24:37 PST 2005

On Tue, Nov 08, 2005 at 12:36:44PM -0500, John Koleszar wrote:
> If you want cameras to source this format someday, it'd be wise to 
> support the packed formats. A lot of cameras, DV in particular, use YUY2.

Not the camera, only the application which goes between the camera and 
OggStream.  Plus, changing between packed and planar is easy, my reason for 
wanting to avoid packed is there's simply too many different ways to do it.

> >Media players don't have to support every format, nor does any video 
> >codec.  If a video codec (ie, DV) can only output 4:1:1 and the media 
> >player only takes 4:4:4 then an intermediary plugin will be needed to do 
> >the convertion.
> >
> Agreed.. But the problem becomes how to send data to the color 
> conversion plugin. If you don't specify a way to put other formats into 
> this stream, your plugin has to do the work of extracting the data from 
> the original stream and performing a color conversion.

Um, other formats? I don't grok your statement.  This is what I'm talking about:

[Theora] -> Theora plugin -> [OggYUV:420] -> Colorspace Plugin -> [OggYUV:444] -> Media Player

The current design of OggStream lets you build a chain, if needed, to decode.  
The Colorspace plugin only needs to understand OggYUV with the chroma formats it 
needs to convert, Theora only needs to be able to export to the file's native 
format, and the media player only needs to be able to import YUV444.

A more intelligent Theora plugin is, thus, not nessesary.

> I think the design is cleaner if you have a stream extraction plugin 
> just extract the data from whatever the source is 
> (decodec/http/whatever) and put it into an ogg stream, then you let your 
> plugin negotiation find the "one true xxx->theora/yv12" converter. 
> Greatly simplifies writing extraction plugins if you don't have to 
> understand anything beyond bytes and buffers.

Our terminology is different, but I think you just argued for the same thing 
I've been promoting. :-)

OggStream in it's current form is very, VERY simple.  Ogg packets in, Ogg 
packets out.  Ogg packets are convient because they're just buffers of 
timestamped data and libogg2 supplies a nice set of bitpacking/unpacking and 
bytepacking/unpacking functions to use on them.  libogg2 also provides a very 
nice zerocopy memory management for these buffers, which is why I'm using it.

OggFile (or your favorite media framework, network protocol handler, etc) takes 
the data from whatever format it's in, wether that be an Ogg file, RTP, 
Quicktime, etc and extracts the packets from it, sending them to OggStream for 
decoding and setting up the plugin chain using information provided by 
OggStream.

I divide functionality along these lines.  In my experience, establishing 
"layers" to divide what different pieces do both makes the task of implementing 
optimising and debugging simpler as well as makes it more flexible overall.

> >So what I propose for OggYUV is to cover the capabilities of Ogg video 
> >codecs, everything Theora is capable of and prehaps a bit more that we've 
> >seen from other codecs.  4:4:4, as I recall, is supported by the Theora 
> >spec (even if the current implementation doesn't).  
> > 
> >
> I can't speak for what Theora can support today, but the VP3 source it 
> derived from supported UYVY, YVYU, YUY2, and RGB24/32 source data as well.

I think much of that was cleaned up and extended during it's convertion to 
Theora.  I believe additional chroma sampling methods were offered, at least in 
the specification for a header value...

So what exactly is lacking from the current YUV spec, and what is overkill?  It 
sounds like everyone is basically on the same page, we shouldn't try to support 
everything /possible/ but that we should try to support everything /common/.

I think 5, as illum. suggested, is too few.  Prehaps we pick the 7 most popular 
YUV chroma subsampling formats (ignoring RGB for now, I still think it's best 
left seperate), have a 8th option for "extended" (for later minor versions, 
defined with an additional header field which doesn't exist now), have the bit 
for wether it's interlaced or not, have 15 possibilities for packing (with 16th 
being extended) which, as I've read, commonly includes funky packed bit values..

so something like this for these three fields:
  1 Interlaced
  3 Chroma Subsampling
  4 Bit Packing

Chroma Subsampling (these are my suggestions used for example only)
  0 4:4:4
  1 4:1:1
  2 4:2:2 doubled
  3 4:2:2 blended
  4 4:2:0 doubled
  5 4:2:0 blended
  6 <something else, maybe 4:1:0>
  7 Extended

Bit Packing (again, only examples)
  0 Planar Y8 UV8
  1 Planar Y16 UV8
  2 Planar Y16 UV16
  3 Planar Y24 UV12
  3 Packed Y8 UV8
  4 Packed Y16 UV8
  5 Packed Y16 UV16
  6 Packed Y12 UV10
  7 <something else>
  ...
 15 Extended

Some of the combinations between these will be weird, but again, nothing is 
required to support any specific format.  When acceptability is being 
detirmined, page0 is sent to the codec, and later to the web if nothing local is 
found, and the colorspace/packing/etc is all in page0, so a codec or converter 
plugin (which are the same in OggStream's eyes) can give a certain answer.

A plugin can look at page0 and know "I only support planar 4:4:4, 4:2:2, 4:2:0, 
or packed 4:4:4, which this is one of, so yea I can handle it"

I suggest the above because it results in these three fields being byte aligned.  
If put last, they do not need to be byte aligned, and it may even be benefitial 
to leave one padding bit at the end for a future minor revision to use if we 
don't need all of it (ie, if we want only 7 bit packing formats).

We may want a bit that says nothing more than "packed or planar" then a seperate 
few bits for how many bits-per-sample for luminance and chroma is needed.  I 
feel that, at the very least, 8:8, 16:8, and 16:16 needed.  I've seen weird 
12:10 and 24:12 formats used by other codecs, which is why I included them in 
the example table above.

-- 

The recognition of individual possibility,
 to allow each to be what she and he can be,
  rests inherently upon the availability of knowledge;
 The perpetuation of ignorance is the beginning of slavery.

from "Die Gedanken Sind Frei": Free Software and the Struggle for Free Thought
 by Eben Moglen, General council of the Free Software Foundation