[vorbis-dev] Ogg as container format

Sat Sep 29 01:16:28 PDT 2001

on 9/29/01 2:53 AM, Monty at xiphmont at xiph.org wrote:

> IFF-like formats including more than one media type cannot be streamed
> as is.  All the media types are sequential.  Quicktime, at least at
> one time, was also like this.
QuickTime 1.0 interleaved video and audio in the file.  Nothing
in the IFF specs prevent you from doing that.  Granted, each video
frame is stored contiguously, but there can be advantages to that.

Electronic Arts released the IFF spec into the public domain in 1985,
before QuickTime existed, making it a Free file format as well.

> Quicktime until this year couldn't do VBR formats at all.
Please don't confuse the ability of the QuickTime software and the file
format.  It's my understanding that the file format has always been
able to handle it, but the software to handle VBR sound compression hasn't
been written.

> It also cannot really be streamed.  It has to be broken up and sent in
> multiple silmultaneous, parallel streams with the sender seeking madly
> through the file to continue just-in-time delivery of the multiple
> media types.  Again, this is the way things were 1996-ish.  It may be
> different today.
Back when QuickTime 1.0 came out in 1990 or so, one target market was
multimedia on CD-ROM, back then that was 1x CDROM.  Seeking was really
slow.  I can assure you that any QuickTime movie authored in that
period does not seek during the normal playback.  It may seek during
the initial opening of the movie, but once the time-critical playback
hit, there are no seeks.

> Quicktime was not intended for streaming use when it was invented.
Remember, QuickTime is older than HTTP.   I can't fault them for not
predicting the future.

> There's nothing Quicktime does that Ogg cannot.  The difference is
> that Ogg is doing it all at rev 0.
QuickTime (and other formats like it) are very good at editing without
moving lots of data around.  That indirection that you complain about
saves lots of time.  Especially when it comes to video editing.

Right now if I want to insert 5 seconds of sound 20 seconds into a
20 minute piece with using Ogg Vorbis, I have to copy the many megabytes
of data just to hear what it sounds like.  If I did it wrong and have
to do it again, copy again.  With the QuickTime file format, I change
the indexes around and it will play with the new sound in the right
spot.  Once I'm satisfied with how it sounds, then I do the copy to
arrange everything for streaming.

But I get the impression that Ogg was never intended for editing,
it seems to be intended as a streaming delivery format.

Another cool thing that QuickTime can do is re-use the media.  Since
small children learn through repetition, many children's songs have
quite a bit of it.  Take an extreme example like "99 bottles of beer",
I can encode it so that there is one copy of the chorus rather than
99.  They all sound the same, but children may not care.

I'm guessing that isn't Ogg's target market either.

> One would never put an entire frame in one packet.  One *could*, but
> that would be silly.  Think about why.  Others may feel free to chime in.
I think there is a great reason for keeping entire frames contiguous in
the file format.  Hardware acceleration.  Disk controllers do DMA very
well these days.  They understand things like put this big chunk of bytes
over there.  One could even conceive that the DMA goes directly to the
DV decoder on the PCI or other bus, completely bypassing main memory.

By breaking the video frame up, you now require either that the disk
controller or DV decoder understand the ogg format, or you add two trips
over the main memory bus, one in from the disk controller, one out to the
hardware decoder.  The main memory bus is often the bottleneck in today's
computers.

While decoding the Ogg format in hardware is possible, its going to have
to be successful beyond your wildest dreams before this type of
hardware support appears.

Oops, the above example isn't a streaming example so it may not apply.
Think about what's going on in the gigabit ethernet world.  The hardware
designers are looking to offload tasks from the main processor because
it's the bottleneck.  I could see the hardware supporting things like
"put the next N bytes of the TCP payload over there," and the card checks
the headers to make sure it's not spoofed, and makes sure the sequence
numbers increase properly.  It can even send the ACKs.

As an aside, the current implementation of ogg in libogg does WAY too
much memory copying.  I know Segher has complained about this also, and
has threatened to write libogg2, but just hasn't gotten enough free time
to do it.

It comes down to this.  If your world is HTTP streaming either live or
with very little interactive editing other than switching between streams,
then Ogg has advantages over QuickTime.  Once you step out of that world,
then QuickTime has advantages over Ogg.

There is probably a lot the Ogg Vorbis team can learn from QuickTime and
other existing formats if it wants to move beyond streaming.  There are
probably things that got tried and failed, it would be good to hear about
some of those so that the Ogg Vorbis team doesn't repeat history.

Steve

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.