[ogg-dev] Fwd: [whatwg] Cue points in media elements

Sun Apr 29 03:19:37 PDT 2007

Hi,

this is an email from the HTML5 standardisation group, which I thought
would be very interesting to Annodex and Xiph people.

I do not know enough about the cue points that he is talking about to
know how much should be done in html/javascript and how much directly
in the media.

But I found the need for cue points interesting. Maybe someone here
has an answer to Brian on how it could work or what should be improved
in the HTML5 spec to make it work...

Cheers,
Silvia.

---------- Forwarded message ----------
From: Brian Campbell <Brian.P.Campbell at dartmouth.edu>
Date: Apr 29, 2007 5:14 PM
Subject: [whatwg] Cue points in media elements
To: whatwg at whatwg.org

I'm a developer of a custom engine for interactive multimedia, and
I've recently noticed the work WHATWG has been doing on adding
<video> and <audio> elements to HTML. I'm very glad to see these
being proposed for addition to HTML, because if they (and several
other features) are done right, it means that there may be a chance
for us to stop using a custom engine, and use an off-the-shelf HTML
engine, putting our development focus on our authoring tools instead.
My hope is that eventually, if these features get enough penetration,
to put our content up on the web directly, rather than having to
distribute the runtime software with it.

I've taken a look at the current specification for media elements,
and on the whole, it looks like it would meet our needs. We are
currently using VP3, and a combination of MP3 and Vorbis audio, for
our codecs, so having Ogg Theora (based on VP3) and Ogg Vorbis as a
baseline would be completely fine with us, and much preferable to the
patent issues and licensing fees we'd need to deal with if we used
MPEG4.

For the sort of content that we produce, cue points are incredibly
important. Most of our content consists of a video or voiceover
playing while bullet points appear, animations play, and graphics are
revealed, all in sync with the video. We have a very simple system
for doing cue points, that is extremely easy for the content authors
to write and is robust for paused media, media that is skipped to the
end, etc. We simply have a blocking call, WAIT, that waits until a
specific point or the end of a specified media element. For instance,
in our language, you might see something like this:

   (movie "Foo.mov" :name 'movie)
   (wait @movie (tc 2 3))
   (show @bullet-1)
   (wait @movie)
   (show @bullet-2)

If the user skips to the end of the media clip, that simply causes
all WAITs on that  media clip to return instantly. If they skip
forward in the media clip, without ending it, all WAITs before that
point will return instantly. If the user pauses the media clip, all
WAITs on the media clip will block until it is playing again.

This is a nice system, but I can't see how even as simple a system as
this could be implemented given the current specification of cue
points. The problem is that the callbacks execute "when the current
playback position of a media element reaches" the cue point. It seems
unclear to me what "reaching" a particular time means. If video
playback freezes for a second, and so misses a cue point, is that
considered to have been "reached"? Is there any way that you can
guarantee that a cue point will be executed as long as video has
passed a particular cue point? With a lot of bookkeeping and the
"timeupdate" event along with the cue points, you may be able to keep
track of the current time in the movie well enough to deal with the
user skipping forward, pausing, and the video stalling and restarting
due to running out of buffer. This doesn't address, as far as I can
tell, issues like the thread displaying the video pausing for
whatever reason and so skipping forward after it resumes, which may
cause cue points to be lost, and which isn't specified to send a
"timeupdate" event.

Basically, what is necessary is a way to specify that a cue point
should always be fired as long as playback has passed a certain time,
not just if it "reaches" a particular time. This would prevent us
from having to do a lot of bookkeeping to make sure that cue points
haven't been missed, and make everything simpler and less fragile.

We're also greatly interested in making our content accessible, to
meet Section 508 requirements. For now, we are focusing on captioning
for the deaf. We have voiceovers on some screens with no associated
video, video that appears in various places on the screen, and the
occasional sound effects. Because there is not a consistent video
location, nor is there even a frame for voiceovers to appear in, we
don't display the captions directly over the video, but instead send
events to the current screen, which is responsible for catching the
events and displaying them in a location appropriate for that screen,
usually a standard location. In the current spec, all that is
provided for is controls to turn closed captions on or off. What
would be much better is a way to enable the video element to send
caption events, which include the text of the current caption, and
can be used to display those captions in a way that fits the design
of the content better.

I hope these comments make sense; let me know if you have any
questions or suggestions.

Thanks,
Brian Campbell
Interactive Media Lab, Dartmouth College
http://iml.dartmouth.edu