[vorbis-dev] Ogg as container format
rillian
rillian at telus.net
Fri Sep 28 01:33:29 PDT 2001
On Thursday, September 27, 2001, at 06:22 , Kevin Marks wrote:
> OK, I'll bite. Better how? What is bad about QT? How is Ogg better?
I'd like to see some discussion on this too. You raise some good
questions. I don't know qt well enough so say why ogg is better, but
I'll try to address what might be different.
> IFF-like formats have stood up very well over time because of the
> future compatibility built-in (the behaviour for unknown chunks is
> well-defined).
> [...]
> From what I can see of Ogg, everything is down in the stream structure,
> and the lacing values used for packet framing will introduce a lot of
> overhead for packets bigger than 1024 bytes.
The overhead is a constant percentage for large packets. Framing.html
says the idea with the lacing values is to reduce the overhead for small
packet sizes while still allowing large ones. Assuming you want that
flexibility, ogg has less overhead than chunk-based (IFF) schemes for
small packets, and more for large ones. I would agree it's tuned for
things in the 500-1500 byte range, which is the typical size of vorbis
packets.
You're correct that there's no analog to chunk-copying rules in ogg.
There is a concept of 'mappings' that additionally specify restrictions
on what bitstream types are supported and what the interleaving rules
are, but so far we have little experience with anything besides
degenerate (single-datatype) streams. The integrity of each logical
bitstream is preserved, though, and I've mostly been able to convince
myself that's all that's needed.
> What is the point of making packets and pages independent, and having
> two parallel framing structures going on at once, with the concomitant
> problem of having to slice and dice the whole time? You're going to
> have big trouble getting DV or uncompressed video into this structure.
> Dv frames are 120000 bytes for NTSC and 144000 for PAL. They are all
> the same size. To put these in Ogg you need 471 & 565 lacing values per
> frame, and you need to add up these bytes to get the constant length.
This is a really good point, and I think illustrates the real difference
in the two approaches. I understand from your description that the seek
information in qt is stored as some kind of intelligently coded table:
in the case of a constant-bitrate codec that table compresses very well.
The problem with is that well-compressed media data varies wildly in
bitrate, and and so you want to be able to support that, which means you
need more overhead. Ogg was designed with streaming in mind from the
beginning, so it made a lot more sense to distribute the packet length
information throughout the stream.
How does qt handle the streaming case, anyway? Is this what 'hinting' is
all about?
(Note that there's no reason for each frame to be a single packet. Of
course the lacing overhead is the same no matter how you break things
down, but I'd put each frame into multiple packets, e.g. one per
superblock in DV or one per IDAT/JDAT slice in mng. More robust against
data loss.)
I don't understand the reasoning between the two-level framing as well,
but I think the idea is again flexibility. It lets the framing layer
optimize the amount of overhead used for error checking and interleave
regardless of what a codec might choose as a natural packet size. You
could even used fixed-size pages for absolute seeking in the
constant-bitrate case, though of course we haven't relied on this.
Certainly we should be understanding the qt file/stream format better.
I'm happy you've been contributing lately.
Cheers,
-r
--- >8 ----
List archives: http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body. No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.
More information about the Vorbis-dev
mailing list