[Ogg a11y] Review of TDHT

Henri Sivonen hsivonen at iki.fi
Mon Jan 5 03:53:02 PST 2009


This is my review of http://wiki.xiph.org/index.php/Timed_Divs_HTML

> TDHT may be similar to W3C TimedText DFXP in many respects, but in  
> comparison to DFXP it does not re-invent HTML, CSS and effects, but  
> rather uses existing HTML, CSS and javascript for these. The purpose  
> of DFXP is to create a web-independent exchange format for timed  
> text, which is why it cannot directly be specified as a subpart of  
> HTML.
>
> TDHT in contrast is HTML with a minimum number of changes. TDHT is  
> parsable by any HTML parser. It works with CSS and javascript. No  
> new functionality has to be defined for TDHT.

 From a browser implementation point of view, this approach makes a  
lot more sense than DFXP.

> = File Extension =
>
> Files in this format are to be of text/x-tdht mime type.
>
> Files in this format should have a file extension of .tdht .

Establishing new MIME types is hard. I suggest using text/html  
and .html for this format and letting processing be triggered by the  
referencing context.

> Right now, TDHT is based on [http://www.w3.org/TR/html401/  
> HTML4.01], but it should also be possible to work on [http://www.whatwg.org/specs/web-apps/current-work/ 
>  HTML5], which is still in flux.

Since ongoing work on HTML happens in HTML5, I think it would make  
sense to define this in terms of HTML5 right away. To specify parsing  
properly, you need to reference HTML5 anyway.

> = Rendering in a Web Browser =

> When the browser happens upon a TDHT file, it must create a document  
> by calling createDocument() on DOMImplementation and then calling  
> open() on the created document. The browser must insert a <html>  
> element in the HTML namespace as the root element of the document  
> and insert <head> and <body> elements in the HTML namespace into the  
> root element.
>
> A TDHT file can either be received by a HTML parser in one go (as a  
> TDHT file) or a div-less TDHT file can be received together with its  
> <head> tag and create a HTML shell into which the &lt;div> elements  
> can be added as they come (e.g. from a video file that is decoded  
> and played back in parallel) by using a HTML fragment parser.
>
> The <head> tag must decode into a DOMString (using REPLACEMENT  
> CHARACTER on errors) and set the TDHT DOM property of the <head>  
> element to the DOMString.

These paragraphs mix the processing of separate TDHT files and TDHT  
embedded in Ogg. These cases need to be written out more clearly.

I suggest the following:

For the external TDHT file case:
The TDHT file is parsed using the HTML5 parsing algorithm in its  
normal mode into a non-rendered DOM. To render a div, the children of  
the div would be cloned into the body of the rendering shell document  
(replacing possible previous children of body).

For the Ogg-internal TDHT case:
To multiplex an external TDHT file into Ogg, the innerHTML of each div  
would be placed into a data packet and the start and end time would be  
stored on the Ogg level (i.e. out of the HTML band). To render a  
packet, innerHTML of the body element of the rendering shell document  
would be set to the data of the packet.

> As the browser plays the video, it must render the TDHT &lt;div>  
> tags in sync. As the start time of a &lt;div> tag is reached, the  
> &lt;div> tag appears, and it is removed as the &lt;div> tag's end  
> time is reached. If no start time is given, the start is assumed to  
> be 0, and if no end time is given, end is assumed to be infinity.

"Render" and "remove" need to be defined in terms of DOM mutations and/ 
or CSSOM changes. (I suggest mutating the DOM instead of CSSOM; see  
below.)

It needs to be defined what it means to render TDHT divs in sync with  
the video. In order to reasonably reuse existing browser components,  
which is the whole point of TDHT, the DOM mutations and the CSS layout  
of the rendered captions need to run on the same thread that DOM and  
CSS live in generally--i.e. the main thread.

The spec should probably define changes to the state of rendered TDHT  
in terms of tasks posted on the event loop that manages the iframe- 
like nested browsing context. For high-performance video, one would  
probably want to paint video into a hardware-accelerated compositor  
that can composite a rectangle independently of the event loop of the  
main thread. This means that text would be less accurately synced than  
video and audio. The spec should say that it's OK for text layout to  
drift by some small amount from video, so authors shouldn't expect a  
to-frame exact synchronization of text and video.

> [There is a discussion to be had here about the effect this has on  
> the DOM. Different selectors may apply to a caption depending on  
> whether the video was played back all the way there or seeking  
> skipped over data to get there. It was suggested that inactive  
> captions should be removed from the DOM, so there's always a well- 
> defined small unambiguous DOM to match selectors against. However,  
> this may for example not be desirable on some text display formats.]
>
> An "active" &lt;div> tag may, incidentally, be a &lt;div> tag that  
> is being displayed ("display: block") in contrast to an "inactive"  
> &lt;div> tag, which may not be displayed ("display: none"). For some  
> text formats however the difference between "active" and "inactive"  
> may be a background colour or the display location on screen or some  
> other mechanism. The default should be between "block" and  
> "inactive", but changeable through CSS.

Is suggest using a small unambiguous rendering shell DOM as described  
earlier in this email and not timing CSSOM mutations at all.

> The &lt;div> tags are the data packets of the TDHT text codec and  
> are thus encapsulated into the data packets as text codec data. A  
> complete &lt;div> including all its subtags is encoded into one data  
> packet each.

If you put the <div> and </div> tags into the packet, you'd need to  
specify what happens if the input has any of the following:
* No div element at all.
* More than one div.
* Non-div content before the first div.
* Non-div content after the last div.

These issues can be sidestepped by not putting the div tags themselves  
into the packet.

> = Direct linking on a HTML5 page =
>
> Often, subtitles and other time-aligned text files are not actually  
> provided inside a video stream (e.g. inside Ogg), but are referenced  
> as a separate partner resource to a video.
>
> To allow association of such files with a <video> or <audio>  
> element, we propose the following approach:
>
> <pre>
> <video src="http://example.com/video.ogv" controls>
>  <text category="CC" lang="en" src="caption.srt" style=""></text>

Element name overlaps with SVG are undesirable, so we should avoid  
introducing more overlap. (SVG already has <text>.)

-- 
Henri Sivonen
hsivonen at iki.fi
http://hsivonen.iki.fi/




More information about the Accessibility mailing list