[theora-dev] Extension to Skeleton for multi-track media

Benjamin M. Schwartz bmschwar at fas.harvard.edu
Tue Mar 23 07:19:47 PDT 2010

Silvia Pfeiffer wrote:
> "Language", "Role" and "Name" are fields that we want to introduce to
> better expose "semantic" information about the tracks.

These three are great.  Comments:
1. It is common for movies to list a series of languages, and it's not
always the case that one is dominant.  To accommodate this, we should
permit specifying the Language field multiple times, as allowed in RFC
2822.  The Javascript API should return an array of language codes.
Conventionally, the first language code should be the dominant one if
present.  A track with no language code should return an empty array.

2. Some of the roles are unclear.  It would be good to add clarifying
descriptions of their meaning and intended use.  For example, I don't know
the motivation or use for: text/activeregion, text/annotation,
text/transcript, text/linguistic, text/chapters, audio/music,
audio/speech, audio/sfx.  Also, video/alpha needs to specify how a
multichannel track (like Theora) can be rendered down to a single alpha
channel, for example by using the unmodified bytes of Y as alpha.

3. It seems that the name is meant to be only a semi-human-readable tag,
not a fully user-facing title.  Perhaps a localized Title field would be a
good addition at some point.

> A further part of the wiki page is the proposal to impose an implicit
> order on the tracks through the order in which their BOS pages are
> given. This is nothing semantic, but only a convenience so we can
> ascertain that different Web browsers will address the same track by
> the same index number through JavaScript.

I reiterate my preference for associative arrays, indexed by the Ogg track
ID and name.  The BOS ordering is unstable, and provides no benefit that I
can see over unique stream identifiers.

> Finally there are two rendering related fields that we propose
> introducing: Display-hint and Altitude (their names could of course
> still be changed).

Altitude seems fine.  I have more problems with Display-hint:

Specifying that a track can be shown as PIP might be a good thing.  This
mechanism seems very rigid, though.  Television sets that provide PIP
usually let the user control the positioning, because they may want to see
different parts of the underlying frame.  I'm not convinced that
specifying a position or size along with the PIP hint is necessary at all.
 If it is, the text should say "may be displayed" instead of "should be
displayed" to indicate that the player should give the user control.
Content producers who want exact control of overlay positioning should use
Altitude and video/alpha.

Where are the zero coordinates of the display area?
If w and h are percentages, what are they percentages of?

2. mask:
Ogg files are self-contained.  This proposal breaks that in a huge way,
and I think it's terrible.  The right way to do this is in CSS in the
webpage, a la

Please remove mask from the draft.

3. transparentcolor.
This will not work.  Lossy video codecs do not reproduce exact colors.  I
am not aware of any continuous-tone image or video coding system that
employs this approach, because it doesn't work.  Please remove it from the
draft.  People who want transparency will have to use the video/alpha system.

Further improvements:
As currently stated, the video/alpha label cannot actually be used to
blend multiple tracks together.  For example, if I want an exactly
controlled optional overlay, I would create 3 Theora tracks labeled as
video/main, video/alpha, and video/alternate (or maybe video/additional),
all the same size.  The altitude of the additional track would be higher
than the main, to indicate that it goes on top.  There are now at least
three possibilities:
1. The alpha track applies to the additional track.
2. The alpha track applies to the main track (before compositing)
3. The alpha track applies to the whole video (after compositing)

At present, there is no way to distinguish these cases, and the situation
is even more underspecified in the case of multiple additional tracks.  To
remedy this, I recommend an additional header field "Applies-to: [name]".
 This indicates the name of the track to which a track applies.  For
example, a text track may apply to the to audio track of which it is a
transcription, and the video onto which it should be overlayed.  A
video/sign track Applies-to the audio track of which it is a translation.
 A video/alpha track Applies-to each track it is supposed to mask (before

For video/alpha, this is still insufficient, because masking a video and
an overlay before compositing them is not the same as masking after
compositing.  To permit masking after compositing, video/alpha tracks
should optionally have one or more Altitudes.  For each Altitude held by a
video/alpha track, it applies to the composited result of all visible
higher tracks.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: OpenPGP digital signature
Url : http://lists.xiph.org/pipermail/theora-dev/attachments/20100323/09064747/attachment.pgp 

More information about the theora-dev mailing list