[vorbis-dev] Ogg as container format

Fri Sep 28 01:33:29 PDT 2001

On Thursday, September 27, 2001, at 06:22 , Kevin Marks wrote:

> OK, I'll bite. Better how? What is bad about QT? How is Ogg better?

I'd like to see some discussion on this too. You raise some good 
questions. I don't know qt well enough so say why ogg is better, but 
I'll try to address what might be different.

> IFF-like formats have stood up very well over time because of the 
> future compatibility built-in (the behaviour for unknown chunks is 
> well-defined).
> [...]
> From what I can see of Ogg, everything is down in the stream structure, 
> and the lacing values used for packet framing will introduce a lot of 
> overhead for packets bigger than 1024 bytes.

The overhead is a constant percentage for large packets. Framing.html 
says the idea with the lacing values is to reduce the overhead for small 
packet sizes while still allowing large ones. Assuming you want that 
flexibility, ogg has less overhead than chunk-based (IFF) schemes for 
small packets, and more for large ones. I would agree it's tuned for 
things in the 500-1500 byte range, which is the typical size of vorbis 
packets.

You're correct that there's no analog to chunk-copying rules in ogg. 
There is a concept of 'mappings' that additionally specify restrictions 
on what bitstream types are supported and what the interleaving rules 
are, but so far we have little experience with anything besides 
degenerate (single-datatype) streams. The integrity of each logical 
bitstream is preserved, though, and I've mostly been able to convince 
myself that's all that's needed.

> What is the point of making packets and pages independent, and having 
> two parallel framing structures going on at once, with the concomitant 
> problem of having to slice and dice the whole time? You're going to 
> have big trouble getting DV or uncompressed video into this structure. 
> Dv frames are 120000 bytes for NTSC and 144000 for PAL. They are all 
> the same size. To put these in Ogg you need 471 & 565 lacing values per 
> frame, and you need to add up these bytes to get the constant length.

This is a really good point, and I think illustrates the real difference 
in the two approaches. I understand from your description that the seek 
information in qt is stored as some kind of intelligently coded table: 
in the case of a constant-bitrate codec that table compresses very well. 
The problem with is that well-compressed media data varies wildly in 
bitrate, and and so you want to be able to support that, which means you 
need more overhead. Ogg was designed with streaming in mind from the 
beginning, so it made a lot more sense to distribute the packet length 
information throughout the stream.

How does qt handle the streaming case, anyway? Is this what 'hinting' is 
all about?

(Note that there's no reason for each frame to be a single packet. Of 
course the lacing overhead is the same no matter how you break things 
down, but I'd put each frame into multiple packets, e.g. one per 
superblock in DV or one per IDAT/JDAT slice in mng. More robust against 
data loss.)

I don't understand the reasoning between the two-level framing as well, 
but I think the idea is again flexibility. It lets the framing layer 
optimize the amount of overhead used for error checking and interleave 
regardless of what a codec might choose as a natural packet size. You 
could even used fixed-size pages for absolute seeking in the 
constant-bitrate case, though of course we haven't relied on this.

Certainly we should be understanding the qt file/stream format better. 
I'm happy you've been contributing lately.

Cheers,
  -r

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.