[theora-dev] First steps towards a simple text stream format.

Silvia.Pfeiffer at csiro.au Silvia.Pfeiffer at csiro.au
Mon Aug 11 19:45:19 PDT 2003

Hi Philip,

another aim for annodex was to make things as simple as possible for 
users and application programmers. XML solves the problems of language 
handling and character sets (Unicode is default). So, you won't have to 
worry about these any more. Annodex solves the problem of synchronising 
text with media bitstreams using ogg so you won't have to worry about 
this any more.  Annodex players are rare yet but we're working on it :)

So here's how simple you can write your subtitle file:

<?xml version="1.0"?>
<!DOCTYPE cmml SYSTEM "cmml_1_0.dtd">

   <title>Lord of the Rings</title>

<a start="0" end="5">
   <desc lang="en">Here you place your english subtitle.</desc>
   <desc lang="x-tengwar">And here the elvish translation.</desc>
   <desc lang="i-klingon">Obviously the klingon translation :)</desc>

<a start="5" end="14">


Then you use the related Theora video file and run it through "anxenc" 
and you've got the synchronised file. With DVD you'd automate the 
process of creating the CMML file by transcoding from the DVD directly.

<p>Philip Jägenstedt wrote:
 > Something more complex is exactly what I don't want :) Of course I could
 > accomplish that by not using any additional tags. However, for a simple
 > text stream format as I want, even if I don't use any of the fancy
 > features of annodex, it would still depend on libexim for XML-parsing
 > and libcmml for parsing the... well CMML. Of course since you've already
 > made annodex that wouldn't be my problem, I'm just saying that annodex
 > doesn't seem like just a subtitle format. Which of course it isn't
 > either.

As you notice, all that's needed is to use some libraries (libexpat, 
libannodex, libcmml). I expect that when you develop something similar 
yet a bit more restricted yourself again, you'll also create some 
libraries that others would have to use to support subtitling.

We've actually done some work with captioning people in industry. If you 
want "just" a subtitle format, use the EBU (European Broadcasting Union) 
subtitling standard (so-called STL files). You could distribute that 
file externally to a theora ogg file, so no integration with ogg is 
necessary. You would have several STL files for different translations. 
Then all you need to do is implement a browser that reads the STL files 
and places the selected subtitles on-screen. Maybe this solution is more 
to your liking. Just trying to help.

> Nevertheless, I feel that there is too much "feature overhead" in
> annodex for what I'm thinking. There's really no reason why both annodex
> and the simple subtitle format I envision could exist, since they are
> different niches.

Hmm, I guess all I can say is that in my view there is no "overhead" in 
annodex that you won't have to recreate. The largest problem we had to 
solve was getting the synchronisation issue with different media streams 
sorted out and working well. All the "overhead" that you are identifying 
lies in the additional fields that CMML is providing and those fields 
are not really an overhead if you ignore them. Fell free to implement a 
player that only shows the subtitles and ignores all the other fields.

> BTW, I followed your advise and tried to install the tools. However,
> libannodex would not build, failing on libcmml. I could provide some
> more detailed information if you mail me privately, because theora-dev
> isn't annodex-dev. Instead I tried reading the man pages, but they
> were... how to say... sparse on information.

Many thanks for providing a patch. We really appreciate it. Remember 
that both theora and our annodex are still in development stages, so 
clearly extensive documentation is still in the making. Have patience. :)

Best Regards,

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'theora-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.

More information about the Theora-dev mailing list