[vorbis] Re: [livid-ovd] putting the video into ogg multimedia

Ralph Giles giles at snow.ashlu.bc.ca
Wed Jun 14 19:01:55 PDT 2000



On Tue, 13 Jun 2000, Monty wrote:

>> From what I've read of the 
>> documentation, there doesn't seem to be any codec-enumeration system. 

> That's due to lack of appropriate documentation. API docs are currently
> taking precedence. The headers for each logical stream appear at the head of
> the physical transport stream and that the headers provide enough magic
> to identify codec used by the logical stream. Each logical stream is
> enumerated within the physical stream. 

Ok, that part I figured out.

>> Do we just interleave the packets and let the player guess based on
>> the headers which codec to try? I suppose that works so long as the
>> spec for the multimedia files has a well-defined list of what is to be
>> supported. 

> The short answer is yes, although 'guess' is the wrong word. The codecs
> are identified by header magic. 

Does 'header magic' mean mean "we know this is a vorbis stream because the
first 5 characters are 'vorbis'"? That's what I meant by guess, in
contrast to the way avi does it, for example. I think the former is a
better idea.

>> In my mind that makes [mng] an excellent 
>> source or editing format for digital video. This would probably also 
>> require an option for uncompressed audio; I suppose the pcm .wav header 
>> could be abused for this purpose. 

> Do you mean lossless or uncompressed? Shortly after Vorbis hits 1.0,
> I want to update and rerelease my old Squish lossless audio codec for
> Ogg as well. 

I meant lossless, I suppose, but uncompressed would be good to have
too. I was thinking in terms of source video, where you're often trying to
digitize in realtime, and the audio is a small fraction of the bandwidth.
It would be different for small-format movies, or many unmixed tracks,
etc. What kind of compression ratios do you get with Squish?

> For truly uncompressed audio, I'd prefer to avoid WAV specifically
> for a number of practical and design reasons. I'm happy to provide a
> rant if someone is interested... but you probably were just describing
> desired functionality and I have no reservations about uncompressed
> audio in Ogg. 

Yes, but I'd be interested to hear it anyway. =)

> The problem with motion estimation is not lack of leads but rather
> quite the opposite. An endless number of directions to explore; I'm a
> believer in Steve Mann and his chirplet work, and I've been looking
> forward to playing with that. 

Ah I wasn't aware of this. Thanks. For others in the dark, the primary
reference seems to be http://wearcam.org/chirplet/

> I convinced myself some time ago that what you describe is actually a 
> simple facet of general non-linear editing. Transport streams and how the
> streams inside them are reassembled into a finished presentation are
> actually orthogonal functionality. At one point I wrote a white paper
> called 'the four pillars of Ogg' (long forgotten-- I may not even have
> a copy anymore). The pillars were orthogonal media building blocks (in
> abstract form) I'm fudging the below slightly to make it relevant to
> what's currently going on: 

> 1: The codec (in this case, Vorbis or motion JPEG) 
> 2: The transport stream (bitstream format: in this case, Ogg bitstreams) 
> 3: Linear and non-linear editing 
> 4: Application linkage (interactive behavior, push/pull delivery,
>      real-world interface to the 'self-contained' bitstream) 

You're right, this *is* more interesting. Mostly I feel like I'm drowning
in the complexity, though. See toward the end.

I was thinking of this in terms of the Open Video Disk project, as DVD
"done right". In those terms, while I thought multiple video, audio, and
overlay tracks make sense, I wanted to stay away from the branching and
interactivity features. I know I differ with other folks on the ovd list
on this, but they both feel insufficiently general to me, and I've no idea
what a better answer would be.

Perhaps now is not the time for that. Certainly you want something turing
complete, but if you're just embedding your favorite scripting language,
what are you offering that you can't do better with a dedicated
application? Must one become a programmer to use the format? We need
to solve the problem of how do create interactive content without being a
programmer before we enshrine it in a media format, I think. The tools
aren't here yet; I feel like we'd be reimplemented Director or Hypercard a
decade later with nothing new to add. Myst could be done as a DVD, but if
games like Myst, why not games like Quake, or Day of the Tentacle, or
Zork?

Do you have any ideas how to handle that? Is this as much of a problem as
I think?

>> Tying all this together seems to require some stream-description 
>> metadata. 

> Yep, exactly, which is different than a metadata stream :-) It exists
> as a concept above the transport stream. The transport stream is solid,
> we need to build on it (and do it soon). 

Agreed. Am I correct in understanding that Ogg-the-stream-format has no
place to put this kind of metadata? If not where would we store it? In the
specification? In an xml or binary "header" stream? In the kitchen sink 
with with rest of the metadata?

> Yes, it's hard to say right now if there will be one kitchen-sink 
> metadata stream type in XML, or XML kitchen sink + something lighter
> weight. 

I guess if I were doing it right now, I'd put the lyrics/subtitles in
their own stream (call it transcript) with a simple format, say text plus
the simplest inline style markup and timecodes, mostly designed for
machineable presentation. Separate streams for each language/version/type
of annotation, like with the overlays. We should solicit input from
subtitlers and karoke folks for confirmation, but I think that would work
well. Does Quicktime's "text track" work like this?

Put everything else in a kitchen-sink associated-material stream that's
designed more for human interpretation. The player could scan for <title>,
<artist>, <streamname> and so on, but leave the production notes &c. to
the human reader. This feels unclever to me, so I'm suspicious...

I guess this brings up the question of whether we want to allow
subsections within a work. Is an album one big ogg file with all the
songs, cover art, production notes, photos, making-of movie, videos, et
al. embedded and indexed within, or is it a directory full of song files,
movie files, and html for notes and indexing? Hmm. What about a movie? Is
it divided into chapters (should be scenes) like on DVD? 

I don't know the answer to this; The one-song-one-file method seems to
work well for music, but multi-track movies are more complicated. Would
you like to see ogg become the native format for a nonlinear editor?

> We're very serious about video codec R&D, we just want to make sure
> people realize that such a video codec is a ways off. 

Hooray! This is all very cool.

 -ralph


--
giles at ashlu.bc.ca
I read this list through the archives

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.




More information about the Vorbis mailing list