[vorbis] putting the video into ogg multimedia

Monty xiphmont at xiph.org
Tue Jun 13 23:50:27 PDT 2000



> With both vorbis and libmng nearing stable status, I've been thinking
> about combining the two to make a real multimedia format.

Yes, we've been thinking that we want to use a preexisting video codec, so long
as it's free (as in speech) before taking the time to play with something more
esoteric in R&D.  Motion jpeg is one we've thought about.

> What would be
> involved in embedding mng in an ogg bitstream? 

Defining a mapping; Ogg streams just provide a means for multiplexing and
framing raw packets (along with a time base, etc).  There are no structural or
size limitations on the packets themselves.  Ogg does, however, assume that the
first page of a logical bitstream contains a single packet that functions as
the logical bitstream's header (and, as you're concerned, codec
identification).

> From what I've read of the
> documentation, there doesn't seem to be any codec-enumeration system.

That's due to lack of appropriate documentation.  API docs are currently taking
precedence. The headers for each logical stream appear at the head of the
physical transport stream and that the headers provide enough magic to identify
codec used by the logical stream.  Each logical stream is enumerated within the
physical stream.

> Do
> we just interleave the packets and let the player guess based on the
> headers which codec to try? I suppose that works so long as the spec for
> the multimedia files has a well-defined list of what is to be supported.

The short answer is yes, although 'guess' is the wrong word.  The codecs are
identified by header magic.

> Practical motivation:
> 
> MNG is open, free, unencumbered and seems to have a clue in terms of
> well designedness so it's a good match. While it doesn't provide
> high compression ratios for video sources, it supports both completely
> lossless and motion-jpeg encoding. In my mind that makes it an excellent
> source or editing format for digital video. This would probably also
> require an option for uncompressed audio; I suppose the pcm .wav header
> could be abused for this purpose.

Do you mean lossless or uncompressed?  Shortly after Vorbis hits 1.0, I want to
update and rerelease my old Squish lossless audio codec for Ogg as well.

For truly uncompressed audio, I'd prefer to avoid WAV specifically for a number
of practical and design reasons.  I'm happy to provide a rant if someone is
interested...  but you probably were just describing desired functionality and
I have no reservations about uncompressed audio in Ogg.

> MNG was also designed as an animation format and should achieve good
> compression of both traditional (2D) animation and "talking-head" video
> where localized motion occurs over a static background. I think there's a
> lot of room here for creatively-lossy compression research, some of
> which might be applicable to the free video codec.

The problem with motion estimation is not lack of leads but rather quite the
opposite.  An endless number of directions to explore; I'm a believer in Steve
Mann and his chirplet work, and I've been looking forward to playing with that.

> Finally, I agree with DVD-video's use of graphic overlays for subtitling
> (though the text should also be available in a separate stream) most
> particularly for the annotative possibilities. So even when there is an
> Ogg video codec, I'd still like to see support for transparent mng
> overlays.

Ah, here we get into something more interesting :-) 

I convinced myself some time ago that what you describe is actually a simple
facet of general non-linear editing.  Transport streams and how the streams
inside them are reassembled into a finished presentation are actually
orthogonal functionality.  At one point I wrote a white paper called 'the four
pillars of Ogg' (long forgotten-- I may not even have a copy anymore).  The
pillars were orthogonal media building blocks (in abstract form)  I'm fudging
the below slightly to make it relevant to what's currently going on:

1: The codec (in this case, Vorbis or motion JPEG)
2: The transport stream (bitstream format: in this case, Ogg bitstreams)
3: Linear and non-linear editing
4: Application linkage (interactive behavior, push/pull delivery, real-world 
   interface to the 'self-contained' bitstream)

Ogg was a more powerful, more ambitious project back then :-(  The codec, for
example, was a turing complete rendering engine that ran encoding formats as
programs (like MPEG4's description language in a way, but the 'description' was
the actual living, breathing codec!).  Still, the divisions more or less hold.

1 and 2 are familiar.  Overlays are a simple example of three: we have multiple
logical video streams arriving... how do we assemble them?  Is one an overlay?
Special effect?  Alternate footage?  Perhaps a non-linear branch in the stream?

The bitstream format is designed to be able to handle any of the above
possibilities.  However, with the exception of vorbis, only the low level
foundations are actually coded; stream multiplexing, demultiplexing, framing,
capture, etc are all there... but not necessarily the more complex mechanisms
that will make use of them.

That's because the practical details of all this are not actually specced out
yet. The Ogg bitstream format can support even branching (in more than one
way), but how it would do so within current mechanisms is not formally decided.
In other words, the foundation is finished, but I've only currently built a
single house on it. These details must be set before we move on to actually
using multiplexed stream capability (to avoid accidentally specifying a
short-sighted or overly limiting route by implicit convention).

> Tying all this together seems to require some stream-description metadata.

Yep, exactly, which is different than a metadata stream :-)  It exists as a
concept above the transport stream.  The transport stream is solid, we need to 
build on it (and do it soon).

> I guess that's another way to handle codec identification. How's work on
> that progressing? I see a need for (multiple) text/xml streams for
> lyrics/subtitling/commentary in both the audio and video formats as
> well, which should probably be separate from the meta-data.

Yes, it's hard to say right now if there will be one kitchen-sink metadata
stream type in XML, or XML kitchen sink + something lighter weight.

> towards a useful multimedia format in the near term, even without a
> vorbis-comparable video codec.

We're very serious about video codec R&D, we just want to make sure people
realize that such a video codec is a ways off.

Monty

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.



More information about the Vorbis mailing list