[vorbis-dev] Ogg as container format

Monty xiphmont at xiph.org
Fri Sep 28 09:53:53 PDT 2001



On Thu, Sep 27, 2001 at 06:22:44PM -0700, Kevin Marks wrote:
> At 8:34 PM +1000 9/26/01, Michael Smith wrote:
> >rm goal is for ogg to be a generic media container format in
> >the same way as riff (avi/wav), qt, and so on are. Except better,
> >hopefully ;-)
> 
> OK, I'll bite. Better how? What is bad about QT? How is Ogg better?

Well get to this below.

> IFF-like formats have stood up very well over time because of the 
> future compatibility built-in (the behaviour for unknown chunks is 
> well-defined).

IFF-like formats including more than one media type cannot be streamed
as is.  All the media types are sequential.  Quicktime, at least at
one time, was also like this.  I've not checked since just prior to QT
4.0ish.

The behavior of unknown types in Ogg is also well defined.

> QT's structure was picked up by the MPEG4 committee because of this robustness.

It was chosen because of Apple's lobbying, mindshare, relative lack of
licensing restrictions (only one entity has the patents) and because
so much software already exists to support it.  

QuickTime, frankly, has more technical baggage than any other
container format you can think of, and it has patent issues (just
fewer than MPEG's own system streams).  There's no technically
compelling reason to use Quicktime.

> QT defines the structure of a particular movie independently of the 
> data - you can gain enough information to seek anywhere in the file 
> by reading this movie header from the front. 

Quicktime until this year couldn't do VBR formats at all.  'Enough
info in the header' simply means 'everything is the same size' and
that's a liability, not a feature.  If you stick indexes in the
header, the encoding must be two pass, also a liability if mandatory.

Again, I'm not impressed.

> In fact, this header can 
> be completely independent of the media data, which is how QT is able 
> to import so many other formats.

..and yet there's no reason to do it this way.  That header has
*nothing to do* with being able to import other formats.

It also cannot really be streamed.  It has to be broken up and sent in
multiple silmultaneous, parallel streams with the sender seeking madly
through the file to continue just-in-time delivery of the multiple
media types.  Again, this is the way things were 1996-ish.  It may be
different today.

Quicktime was not intended for streaming use when it was invented.

>  From what I can see of Ogg, everything is down in the stream 
> structure, and the lacing values used for packet framing will 
> introduce a lot of overhead for packets bigger than 1024 bytes.

No, framing/paging is a constant .5%-1% overhead for large packets.
The lacing/framing is designed the way it is for a reason (roughly
constant overhead regardless of packet payload size).

There's nothing Quicktime does that Ogg cannot.  The difference is
that Ogg is doing it all at rev 0.
 
> What is the point of making packets and pages independent, and having 
> two parallel framing structures going on at once, with the 
> concomitant problem of having to slice and dice the whole time? 

'Packets' are not a framing structure.  Only paging provides
captutre/framing.  If you look at the way things are set up, you'll
notice that there's no duplicated functionality and that packets and
pages are wholly orthogonal, asynchronous concepts in Ogg.

Pages are a way of freezing packets of arbitrary size into a stream.

You've obviously read the spec, you just need to think a little more
about it.

> You're going to have big trouble getting DV or uncompressed video 
> into this structure. 

Bull.  Both were very much on my mind when I designed all this.

> Dv frames are 120000 bytes for NTSC and 144000 
> for PAL. They are all the same size. To put these in Ogg you need 471 
> & 565 lacing values per frame, and you need to add up these bytes to 
> get the constant length.

One would never put an entire frame in one packet.  One *could*, but
that would be silly.  Think about why.  Others may feel free to chime in.

As for 471/565 lacing values, that's less than half a percent overhead
for potentially very fine grained packetization (hint; if you're doing
things right, each packet is about 200-500 bytes and the overhead is
*still the same*).  Doing it any other way would kill us in overhead
for *small* packets (like low bitrate audio) where packets are only
40-50 bytes.

Monty

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.



More information about the Vorbis-dev mailing list