[theora-dev] First steps towards a simple text stream format.

Philip Jägenstedt philipj at telia.com
Tue Aug 12 10:26:44 PDT 2003

Hi again!

On Tue, 12 Aug 2003 12:45:19 +1000
Silvia.Pfeiffer at csiro.au wrote:

> another aim for annodex was to make things as simple as possible for 
> users and application programmers. XML solves the problems of language 
> handling and character sets (Unicode is default). So, you won't have to 
> worry about these any more. Annodex solves the problem of synchronising 
> text with media bitstreams using ogg so you won't have to worry about 
> this any more.  Annodex players are rare yet but we're working on it :)
> So here's how simple you can write your subtitle file:
> [snip]

Yes, that does look very attractive :) My objection is not with the
format of this, but with the headers that wrap around the entire ogg
stream with the purpose of allowing search engine spiders to "penetrate"
the file. This is somehow mixing two different things.

> Then you use the related Theora video file and run it through "anxenc" 
> and you've got the synchronised file. With DVD you'd automate the 
> process of creating the CMML file by transcoding from the DVD directly.

Except that on DVDs, the subtitle is stored as images of course, so
there's a whole lot of work involved in this process.

> Philip Jägenstedt wrote:
>  > Something more complex is exactly what I don't want :) Of course I could
>  > accomplish that by not using any additional tags. However, for a simple
>  > text stream format as I want, even if I don't use any of the fancy
>  > features of annodex, it would still depend on libexim for XML-parsing
>  > and libcmml for parsing the... well CMML. Of course since you've already
>  > made annodex that wouldn't be my problem, I'm just saying that annodex
>  > doesn't seem like just a subtitle format. Which of course it isn't
>  > either.
> As you notice, all that's needed is to use some libraries (libexpat, 
> libannodex, libcmml). I expect that when you develop something similar 
> yet a bit more restricted yourself again, you'll also create some 
> libraries that others would have to use to support subtitling.

Yes, indeed there would be libfoosub (no, I don't have a name for my
format), but nothing beyond that. Since I haven't looked too closely at
the internals of annodex, tell me -- does an application which suports
annodex use only libannodex, or do you also access libcmml?

> We've actually done some work with captioning people in industry. If you 
> want "just" a subtitle format, use the EBU (European Broadcasting Union) 
> subtitling standard (so-called STL files). You could distribute that 
> file externally to a theora ogg file, so no integration with ogg is 
> necessary. You would have several STL files for different translations. 
> Then all you need to do is implement a browser that reads the STL files 
> and places the selected subtitles on-screen. Maybe this solution is more 
> to your liking. Just trying to help.

Actually, not having external subtitles is one of the motivations for my
little endeavour. I do appreciate your help and suggestions. I realize
that I might be making a "no-no-no, I want to my OWN format"-impression,
but that really isn't what I'm doing. I just want something that works
well and is clean.

> > Nevertheless, I feel that there is too much "feature overhead" in
> > annodex for what I'm thinking. There's really no reason why both annodex
> > and the simple subtitle format I envision could exist, since they are
> > different niches.
> Hmm, I guess all I can say is that in my view there is no "overhead" in 
> annodex that you won't have to recreate. The largest problem we had to 
> solve was getting the synchronisation issue with different media streams 
> sorted out and working well. All the "overhead" that you are identifying 
> lies in the additional fields that CMML is providing and those fields 
> are not really an overhead if you ignore them. Fell free to implement a 
> player that only shows the subtitles and ignores all the other fields.

Indeed, the use of CMML is overhead in my view -- overhead for a
subtitle format, but not necessarily overhead for the other things
annodex does. As I've indicated, the fact that annodex wraps around all
other streams is what I have the most difficulties accepting. Do you
mean that annodex doesn't enforce this, and that I could use only the
annotation stream of annodex, or is this violating the spec?

// Philip Jägenstedt
--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'theora-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.

More information about the Theora-dev mailing list