[theora-dev] Section 508/Closed captioning

Fri Jul 19 17:03:22 PDT 2002

On Friday, July 19, 2002, at 09:33 PM, Kenneth Arnold wrote:

> The plan is to use an interleaved MNG stream for graphical
> closed-captioning. There's already an MNG interleaver in ogg-tools in
> CVS.

Yes. I would (continue to) argue strongly that this be included as part 
of the theora baseline. It has really rich possibilities and solves a 
number of problems much better than other available options.

To clarify, there's a utility that generates degenerate ogg-encapsulated 
mng files in cvs. I never ported it to the oggmerge framework, so some 
work is necessary to interleave/playback mixed-media streams. That's 
pretty much the top of my todo list.

> If someone could write or find a time-oriented *streamable* XML (or
> XML-like) text display and positioning stream definition, this could
> also be used for closed captioning purposes.

I agree we need both. Graphic subtitles are only a half effort when it 
comes to accessibility, and text also helps with indexing efforts. 
Furthermore, as resolutions go up the extra complexity of layout and 
rendering become more of a win over just compressing pre-rendered titles.

I felt I'd produced a reasonable proposal for an xml data format last 
time we went through this, reasonably supporting both song lyrics 
(including karaoke) and film subtitles. The part I wasn't happy with was 
I never worked out a way the accumulated text could be formatted for 
pretty-print as a script or transcript. That part seemed hard.

For what it's worth, in the intervening interval I've been leaning more 
towards the simple  title character vector approaches based on the 
vorbis comment header. The arguments against xml are code size and 
complexity, and that we're somehow abusing the model by delivering and 
rendering the document incrementally with drop-out correction. I also 
thought we could share the parser with the kitchen-sink metadata stream, 
but the methods for encoding RDF in xml are so ugly concensus is that 
wouldn't be a good decision. The arguments against the character vectors 
are lack of markup and extensibility, for example to positional 
information which I'd not thought of previously, but of course is useful 
and available in a limited way from closed-captioning data.

If we could solve the transcript problem, I'd swing back. I suspect the 
code size arguments will pale over time before the combination of 
increasingly well distributed parsers and the complexity of rendering 
international text.

  -r

<p>--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'theora-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.