[vorbis-dev] Ogg MIDI proposal

Kenneth Arnold ken at arnoldnet.net
Sun Aug 26 16:38:22 PDT 2001


On Sat, Aug 25, 2001 at 07:34:26PM -0700, Ralph Giles wrote:
> Jack said the format 1 -> format 0 conversion could be done without loss of generality, so I
> generally agreed the '0 only' spec made sense. I'm not entirely clear on whether this imposes a 16
> 'device' limitation for Ogg MIDI, or not, and how onerous that actually is.
> 
> In theory yes, you should be able to group multiple midi streams. But they might be alternate
> tracks instead of tracks to be mixed in. Same would go for multiple vorbis streams, and so on,
> which brings us again to the metadata can of worms. See below. :)

Okay -- but metadata is far too important to be given a "can of worms"
description ;)

> A lot of what you mention is important work from the tools point of view; mostly we've been
> thinking at the spec level, which has something to do with why we missed your concerns. At the ogg
> and vorbis levels, we already have everything we need.

libvorbis is okay, and vorbisenc is workable -- though I suggest a
more general way to accomplish the same thing.

> Figuring out how to properly interleave all these bitstream types is also *hard* in a way that had
> me assuming much of it belonged at the application level. And there will be so many different

But at the library level you solve those problems ONCE. Look at
vorbisfile. Imagine that expanded to handle interleaved data. Of
multiple codecs. Imagine expecting application writers to intergrate
that big muck into their apps and keep up with changes. That's how it
would have to be if it's not in a central library. Interleaved playing
is a mess, especially at the level of generality we're hoping to
achieve. Don't shove off the responsibility of doing it right to the
apps. Ideally even encoding complex streams should be little more code
than play_file in ogg123 right now, and it can be done.

> types. I do expect oggmerge will eventually have either plugins or a nice interface for compiling
> in support for various formats, but I haven't seen how we can have a unified library-level
> architecture that handles everything. Can you explain your idea?

Our obvious precursors are Quicktime and AVI, and to a lesser extent
IFF and all the structured media formats following from it. They
mostly have this concept down already, so I thought it would be
obvious. Maybe it isn't, so I'll elaborate.

Take for example Windows Media Player. From the user's point of view,
it can play just about anything. If it can't play whatever is thrown
at it, but the data is in a structure it can understand
(i.e. AVI/WAVE, but in our case it would be Ogg), it submits the
infamous FOURCC code to some Microsoft server, and it gets a codec
back, then happily starts playing the file. I envision that an Ogg
format player should easily be able to at least do this. It requires
two general things to work -- a codec identifier and a library
interface. The codec identifier is the easy part -- we already pretty
much have that. The library part (or equivalent statically-linked
counterpart for systems that don't support them) was my "generic codec
interface" concern. Basically, like libao, the end-user application
(WMP) has a general interface for anything that will take in either
Ogg packets or appropriate raw-format data and spit out the opposite,
and a way to specify which codec is used to do that and any options
that codec may present. There is a slight issue with the 'raw' data in
different formats, but Windows has already conveniently solved this
issue for us -- codecs can just request which sort of input data they
want, and the library does the work of converting it, if necessary.

So the Ogg encoder would work like this:
* select sources (audio, video, text, control data, etc.)
* select codecs (Vorbis, PCM, MNG, Tarkin, MIDI, etc. -- note that
simple raw data packagers can be basically no-op codecs)
* select options (each codec has an interface function to enumerate
available options, and the user can fill them in, sort of like how
libao and ogg123 talk today (though libao should give an
option-enumeration interface (or am I just missing it?))
* run encode loop
 - while data is available from any input stream
  - read an appropriately-sized chunk from the one with the earliest
    active timestamp
  - submit the chunk to the library which submits it to the codec
  - get the Ogg packet back
  - put it in the stream and send it out, according to whatever
    interleaving 

And the player:
* look at the first few packets to find which codecs are in use
* load the available codecs and fetch the rest
* determine output formats (video codecs will probably need to have a
standardized header specifying this, or they could provide it after
submitting the header packets) and initialize output devices as
necessary
* display info (metadata -- but what to do with present Vorbis
comments? they only apply to the Vorbis stream and no use having the
same sort of header for each other codec when everything can be
acceptably and machine-parsably represented by the metadata stream?)
* run decode loop
 - heuristics for this will have to be tuned, as decode is much more
   time-sensitive than encode
 - while data is available in the stream
  - submit packets to the codecs
  - keep submitting until output buffers are sufficiently filled
  - start global metronome
   - for streams with audio this should be keyed off the audio output,
     else a timer
  - for each "tick", play/show/do the sound/frame/action associated
    with that moment in time
  - of course keep submitting packets

A lot of the player work has already been done in e.g. xine, which
would be a good source of ideas or even a fully-capable Ogg stream
player without too much work.

What about seeking? I don't know how vorbisfile does it for Vorbis
streams, but it has been described to me as pretty much
guess-and-check. What happens with interleaved streams? We'll probably
need seeking hints in the metadata unless anyone has any better ideas
(remember the problems frequently associated with AVIs, especially
DivX but others also?).

> Right, I maintain we don't need this yet. The 'principle use first' doctrine (why the vorbis
> header must come first in an OggMIDI file) takes care of simple player selection and mimetype
> issues. Anything sophisticated enough to understand that this file has both vorbis and midi data
> can search the headers just like the players do. ogginfo could also be invoked to implement this
> for scripts.

Streaming? Can a player always be able to fetch and load all the
codecs before the stream stops, or will it have to drop the stream,
fetch the codecs, and start back up again? It may turn out not to be
enough of a problem, though -- we'll see when we start implementing.

> Requirements gathering could certainly be happening faster. The two main things I wanted to see
> happen is a video player implementation that supports all the overlay and alternate track features
> we've discussed (dvd done right) and some kind of support for structured text data (my infamous
> xml proposal). Audio, Video, and Text are the three pillars of our grand unified theory of
> multimedia, and I'd want us to have a experience and a plan for each of them before we write this
> stream description format.

For overlays, navigation, multiple tracks, etc. there will have to be
some information that is available before any codec data can be
played. This means either a Grand Unified Header or (the DVD route) an
IFO-type external file.

> Now, anyone could slap an ogg packet format around the experimental tarkin or w3d codecs and start
> playing with the video issues; that's ready anytime. Metadata I think needs to be connected to an
> external database, so more thinking should happen there. Or we could settle on 'musicbrainz is
> good enough' for now.

One thing is for sure -- thinking -- and doing -- as far as metadata
must happen, and soon.

/me steps off soapbox

Perhaps I should spend tomorrow afternoon cleaning up my generic codec
work, making it into something usable, and posting it up. Getting
seeking, metadata, etc. handling working will require a ton of work,
but I think I can get something preliminary working soon, even though
school starts tomorrow :(


-- 
Kenneth Arnold <ken at arnoldnet.net> / kcarnold / Linux user #180115
http://arnoldnet.net/~kcarnold/



<HR NOSHADE>
<UL>
<LI>application/pgp-signature attachment: stored
</UL>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: part
Type: application/octet-stream
Size: 233 bytes
Desc: not available
Url : http://lists.xiph.org/pipermail/vorbis-dev/attachments/20010826/62be5aed/part.obj


More information about the Vorbis-dev mailing list