[ogg-dev] Ogg/Kate preliminary documentation

Fri Feb 8 03:12:08 PST 2008

> Some of the things you talk about were not solved at the CMML level, but
> rather through using different Ogg
> logical bitstreams.

While this is possible to do it this way (and probably a good idea for the
examples like a clock in a corner), it implies that all the placements and
logically different "items" are known at the start of the stream (since the
Ogg spec says a stream can't start midway through another stream, an
interesting restriction, but which is there nonetheless). While this is fine
for a file based stream, it is not if the stream is generated in realtime.

While it is not used at the moment, I do have a "category" field in the ID
header, meant to be a tag used by a player to know what is supplied by
a particular stream (eg, the user may want to select a number of categories,
such as "transcript" and "commentary", and a language, and two streams
would be displayed by the player.

However, forcing the use of several separate streams, while having the
advantage of keeping things simple (and being the solution I selected for
multiple languages), may be overly restrictive.

> * overlapping timed text pieces would be coming in through differnt logical
> bistreams or the CSS (there may be a timing extension necessary to CSS to do
> so - if you have found a better way of doing this, I'll be keen to see)

Not a better solution, I'm afraid, merely a different one. You define
regions and
(very simple) styles, and there is a system of "motions" (mostly splines) that
can alter attributes like color, position, etc. It's another custom scheme I'm
afraid, but one which is kept simple and powerful I believe (hope ?).

> The advantage of having things in different logical bitstreams is that you
> can create addressing schemes can refer to just a subset of logical
> bitstreams if you e.g. only want some part of the composition delivered to
> you from a server. For example,
> http://example.org/video.ogx?track=video,audio,transcript will avoid giving
> you the digital time,logo, and channel number tracks for the above example.
> The CMML design has always focused on trying to keep things in components
> that can easily be added or taken away.

This is a very good point, and the real point of Annodex, if I'm not mistaken
(addressability of audio/video content) ? Kate does not attempt to deal with
this, it's totally outside its scope. I understand that CMML does this for non
CMML streams anyway (eg, Theora) ?

> I'm very keen to seeing your specifications and seeing kate at work - it may
> well be that you have found some better solutions to some of the problems
> that we attack differently with CMML and thus we should think about picking
> the best designs. Really wanting to see it working - post your specs and the
> patched vlc version here if you can!

I'll send you a recent snapshot, feel free to take inspiration from it, but I've
only worked on it for about a month now, so don't expect to see much you
haven't solved yet :)
I do not have a patch for vlc, only MPlayer and xine (MPlayer does only text
subtitles, but xine does all). As for specs, since the bitstream
format is still in
flux (and the API to a lesser extent), there are no docs yet. The wiki page is
all there is for the moment.

> BTW: on the kate wiki page, Annodex is mentioned - what annodex is is simly
> a Ogg file with skeleton and a CMML track in addition to other digital
> media. It's a term that we used to specify the particular multiplexed file
> with which we wanted to work, but it hasn't really much meaning in itself
> nowadays.

Yes, I've noticed that very much of the code (in xine, say) was shared to decode
Ogg and Annodex streams.