[ogg-dev] OggYUV

Tue Nov 8 23:57:08 PST 2005

> If you want to limit the allowable values for the FourCC field, I don't have 
> an issue with that, but I think it's useful for decoders to be able to tell 
> easily whether or not they support the format (since most decoders will 
> operate on the well defined formats) and useful for encoders, since most data
> sources are described by a fourcc (the exception being application that
> actually generate images, rather than extract/transcode, I suppose)

I disagree with this, most decoders using OggStream are unlikely to be using 
FourCC, or at least the ones I care most about, and this places a complexity 
burden on all implementations which use OggYUV such that they *MUST* have a 
table of FourCC -> format mappings, whereas software which already supports 
FourCC should already have a table of these mappings and be able to quickly see 
if a OggYUV stream is directly mappable to a raw YUV FourCC codec.

Also, as you pointed out, many FourCC implementations are ambiguously defined 
and are thus inadequate on their own.  No.  Backwards compatability to this 
obsolete codec-identification system should be provided by software which 
actually uses the older system, not forced on all implementations of the newer 
system.  Shorthanding fields saves only a few bytes in the stream header (we're 
not even talking data packet header) and adds manditory complexity.  

I'll address the other elements of your draft line by line:

> Displayed Width&Height
> Stored Width&Height
> Aspect Ratio (Fractional)

Aspect ratio is what makes pixels potentially non-square, and since we're not 
encoding in blocks as most compressed codecs do, what purpose would having a 
different displayed/stored width/height serve?

I implemented 24-bit fields for width/height/aspect_num/aspect_den just as 
Theora does.  Honestly, I don't forsee anyone doing greater than 65536 wide 
video in the next, oh, 50 years, being as even our current high definition video 
is only getting up toward 4096 wide and the bandwidths for such ultra-super-high 
definition video would certainly surpass anything we'll have in the near future, 
but heck, might as well use the same as Theora, right?

Who will cry over 2 wasted bytes in a raw video codec header? :-)

> Colorspace (enum, R'G'B', Y'CbCr, JPEG (not sure proper name), etc)

This isn't what colorspace means, from what I've seen at least.. Theora 
implements ITU 601 and CIE 709 colorspaces, which apparently tell the decoder or 
converter how to properly map YUV values to RGB.  It's not YUV vs RGB, but 
rather one of those fields unique to YUV video.

Correct me if I'm wrong, or if "Colorspace" is ambiguous.

I provided an 8 bit field for this, just as Theora, though we'll likely not use 
more than half of this space in the near future.

> // Subsampling data
> U Channel X Sample Rate (Fractional)
> U Channel Y Sample Rate (Fractional)
> U Channel X Sample Offset (Fractional)
> U Channel Y Sample Offset (Fractional)
> V Channel X Sample Rate (Fractional)
> V Channel Y Sample Rate (Fractional)
> V Channel X Sample Offset (Fractional)
> V Channel Y Sample Offset (Fractional)

I'm unsure what you're trying to do here.  Implement 4:4:4 vs 4:2:2 vs 4:2:0?  
What is the offset, why is it fractional?

All the common formats (ignoring some of the older FourCC which are rarely used) 
implement a two-line system with no more than four pixels in each "block".  
Thus, we can implement this very simply.  Y-U-V is always provided in that 
order (wether planar or packed), so what we must encode is on which luma pixels 
chroma data is provided for.

8 pixel block can have 1, 2, 4, or 8 chroma samples, so this should be our first 
2-bit field, then (only applicable if 2 or 4) we can stagger chroma in both x&y, 
then we can split chroma in both x&y, resulting in the following table of valid 
possibilities (*=doesn't matter):

00**00: UV -- -- --
        -- -- -- --

00**10: U- -- V- --
        -- -- -- --

00**01: U- -- -- --
        V- -- -- --

00**11: U- -- -- --
        -- -- V- --

010000: Impossible

011000: UV -- UV --
        -- -- -- --

010100: UV -- -- --
        UV -- -- --

011100: UV -- -- --
        -- -- UV --

011010: U- V- U- V-
        -- -- -- --

011001: U- -- U- --
        V- -- V- -- 

011011: U- -- U- --
        -- V- -- V-

010110: U- -- V- --
        U- -- V- --

010101: Impossible

010111: Impossible

011110: U- -- V- --
        V- -- U- --

011101: Impossible 

011111: Impossible 

100000: UV -- UV --
        UV -- UV --

101000: Impossible

100100: Impossible

101100: UV -- UV --
        -- UV -- UV

100010: U- V- U- V-
        U- V- U- V-

100001: Impossible

100011: Impossible

10**11: U- V- U- V-
        V- U- V- U-

11****: UV UV UV UV
        UV UV UV UV

(All the "Impossible" entries are duplicates, since all these bits do is shift, 
and often, shifting results in the same despite it's arrangement)

I'm not proposing this mapping is complete, but there are less than 20 and (if I 
didn't make mistakes) it's all done using simple bit shifts which can be 
generically programmed.  

Being as there's less than 20, it may make sense to simply define these sets on 
their own, giving the 6 bits over to "data format" and make these part of the 
spec.  As far as mapping to other codecs, it really doesn't matter, since 
there'll probobally be a table which says "map to this arrangement".

Mapping to maximum 32 would only require 5 bits, if we wanted to condense this 
list to index#'s, gaining another format bit for something else... in the 
current spec draft, the top bit is used for interlaced flag, the second for 
wether the data is packed or not.. this may not be sufficient.

> // Storage data
> A Channel Bits Per Sample
> A Channel Field 0 Offset (in bits)
> A Channel Field 1 Offset (in bits)
> A Channel X Stride (in bits)
> A Channel Y Stride (in bits?)
> Y/R Channel Bits Per Sample
> Y/R Channel Field 0 Offset (in bits)
> Y/R Channel Field 1 Offset (in bits)
> Y/R Channel X Stride (in bits)
> Y/R Channel Y Stride (in bits?)
> U/G Channel Bits Per Sample
> U/G Channel Field 0 Offset (in bits)
> U/G Channel Field 1 Offset (in bits)
> U/G Channel X Stride (in bits)
> U/G Channel Y Stride (in bits?)
> V/B Channel Bits Per Sample
> V/B Channel Field 0 Offset (in bits)
> V/B Channel Field 1 Offset (in bits)
> V/B Channel X Stride (in bits)
> V/B Channel Y Stride (in bits?)

I'm unsure what any of this is or why it's nessesary.  Please explain?

> I'm still not convinced that RGB and YUV can't (shouldn't) be combined,
> since RGB is so similar to a 4:4:4 YUV format.

Compare http://wiki.xiph.org/OggRGB to http://wiki.xiph.org/OggYUV - yes, they 
are similar, but YUV is much more complex, and I see no reason to join them.  
Or, if you prefer, think of them as one codec with two identifiers which change 
the fields around in the header/etc.

Load up the current draft at http://wiki.xiph.org/OggYUV 

-- 

The recognition of individual possibility,
 to allow each to be what she and he can be,
  rests inherently upon the availability of knowledge;
 The perpetuation of ignorance is the beginning of slavery.

from "Die Gedanken Sind Frei": Free Software and the Struggle for Free Thought
 by Eben Moglen, General council of the Free Software Foundation