[theora-dev] First steps towards a simple text stream format.

Sat Aug 9 15:40:10 PDT 2003

Hello everyone!

This list may not be entirely appropriate discussion, but in the lack of
ogg at xiph.org or ogg-dev at xiph.org this will have to do.

I've been thinking for a few weeks that Ogg needs a simple text stream
(read subtitle) format to go along with theora. This is important,
because otherwise I can't transcode fellowship of the rings while
keeping the elvish-speek, unless I render the text onto the video frame,
and that's not cool. As you can see, the world will end if there is not
a subtitle format for Ogg soon.

This is what I've come up with.

Goals:
  To create a generic text stream format which is flexible enough to be used
  for subtitles or lyrics, but doesn't attempt to do more than it
  should. The idea is that this format is made to be accepted by Xiph.

  I could pretend that this is just a text stream, but if it were just a
  text stream, all it would do is deliver a string of text at a given time.
  This format needs to do at least one thing more, namely specify the
  duration of that string, or it couldn't be used for subtitles. This brings
  up the question of why it shouldn't do text decorations (bold, italics,
  underlined), colors, sizes or even deliver the fonts themselves. If you
  want to see a subtitle format that does some of this and much more
  see USF (Universal Subtitle Format). Certainly one could embrace USF and
  try to fit it into Ogg, but IMHO, it simply does more than a subtitle
  format should. There can even be images and stuff in USF, and there's
  plans for implementing rotations/transforms. Now this sounds a bit similar
  to something else, namely SVG (USF is also XML-based BTW).

  I'm thinking that if anyone wants anything more complex than a text stream,
  they would use the mng-in-ogg stream that someone's working on, or in the
  future SVG (which natively has a way to deliver fonts, and could do all
  that USF does and more) could be baked into an ogg stream. But that's
  not what this format is about, just the reason why it won't do stuff that
  some people will inevitably ask for.

Design features:
 * Text streams must be encoded in UTF-8. Most current simple subtitle
   formats don't specify an encoding at all.
 * Text streams must specify a language in the format specified by rfc-3066.
   This is so that a player application may select the stream which best
   fits the users locale, or perhaps load a different font better suited
   for rendering the given language. rfc-3066 is cool because it's not just
   limited to "real" languages -- you can specify Klingon subtitles!
 * Each text stream must specify a description, to let a user select between
   several. For subtitles this might be the language, e.g. "English",
   "English for people who can't hear quite well", "Svenska". In UTF-8.
 * A vorbis-comment block where whatever comments can be stored.

Tools:
  It's quite clear that the actual implementation of the format outlined here
  wouldn't be 10 years of work. However, a good toolchain to actually use the
  format would probably take alot of time. If anyone's tried extracting a
  subtitle stream from a DVD using transcode, subtitle2pgm, gocr, ispell and
  a couple more tools, you know it isn't exactly inuitive. Hence, it wouldn't
  hurt to have a (graphical) tool which could convert dvd subtitles into this
  magic format in a manner which is more intuitive.

  Anyway, I hope that in the years to come some people will actually create
  multimedia content with vorbis+theora from the start, so that it's not
  simply used for backing up DVDs. In other words, more generic tools to
  author subtitles from scratch need to exist. However, all of this is
  far into the future and not the focus of my immediate concern.

There is however one problem which I don't know what to do about:

How to pack the text strings? What the SRT subtitles in ogm does is have
a separate page for each subtitle. This is a simple solution, but it
means that the overhead of the subtitles will probably be over 50%, and
that ain't cool. The problem is of course that the subtitles can be very
far apart in time, so if they are all lumped together into comfortable
chunks, they'd only display if you play the file right through without
seeking (becuase if you seek past where a subtitle begins, but within
its duration, you're never going to see it). What possible solutions
might there be to this? Having the player application seek through the
entire file in the beginning and keep all subs in memory is a solution,
but not a good one. Also, even if each subtitle has its own page, you're
still not going to see it if you seek into the middle of its duration.

Any comments/insights are welcome.

// Philip Jägenstedt
--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'theora-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.