[theora-dev] Extension to Skeleton for multi-track media

Silvia Pfeiffer silviapfeiffer1 at gmail.com
Tue Mar 23 20:20:58 PDT 2010

On Wed, Mar 24, 2010 at 12:16 PM, Benjamin M. Schwartz
<bmschwar at fas.harvard.edu> wrote:
> Silvia Pfeiffer wrote:
>>>> A further part of the wiki page is the proposal to impose an implicit
>>>> order on the tracks through the order in which their BOS pages are
>>>> given. This is nothing semantic, but only a convenience so we can
>>>> ascertain that different Web browsers will address the same track by
>>>> the same index number through JavaScript.
>>> I reiterate my preference for associative arrays, indexed by the Ogg track
>>> ID and name.  The BOS ordering is unstable, and provides no benefit that I
>>> can see over unique stream identifiers.
>> I can see where you're coming from, but building an associative array
>> is something that the application has to do. It will create an array
>> saying that serialno x matches to position i on the index array.
> I don't agree with this definition of an associative array.  In
> javascript, the associative array would have keys that are track names and
> values that are MediaTrack objects.  No positional index is ever defined.
>> However, the order is still not specified by this. We have to create
>> an order that can be maintained between applications.
> Why do you have to create an order? I cannot think of any programming task
> that requires such an ordering.

An index is the easiest way to address a track lacking any other
information. The serial number cannot be used for addressing, since it
should not be exposed, will not work across chained streams, and
nobody wants to deal with such long, meaningless numbers anyway.

>>>>> Finally there are two rendering related fields that we propose
>>>> introducing: Display-hint and Altitude (their names could of course
>>>> still be changed).
>>> Altitude seems fine.  I have more problems with Display-hint:
>>> pip:
>>> Specifying that a track can be shown as PIP might be a good thing.  This
>>> mechanism seems very rigid, though.  Television sets that provide PIP
>>> usually let the user control the positioning, because they may want to see
>>> different parts of the underlying frame.  I'm not convinced that
>>> specifying a position or size along with the PIP hint is necessary at all.
>>>  If it is, the text should say "may be displayed" instead of "should be
>>> displayed" to indicate that the player should give the user control.
>>> Content producers who want exact control of overlay positioning should use
>>> Altitude and video/alpha.
>> It's a display HINT, therefore it's always just a suggestion to the
>> player.
> Sure.  I guess I'm just nitpicking as to whether the location is a very
> useful hint.  Are there other systems that provide such hints?

That is a good point. I did a bit of searching. I found MPEG hints at
the possibility of using their parallel tracks for pip applications,
haven't specified it in the container. Thus, for example in HD DVDs,
there were formats specified to allow such, e.g.
http://blogs.msdn.com/ptorr/archive/2006/09/11/750124.aspx . They
don't put that information into the media file, but into associated
xml files. Seems to also exist in Bluray, but is known as "secondary

As much as possible, I would try and avoid creating dependencies on
external xml files for providing display hints on a media resource.
The more information that is outside the file, the less can be done
with the file by itself.

OTOH, we could just call the video track for display as PiP a
"video/secondary" or "video/alternate" and leave it to the player to
decide to display it as PiP. Then this hint is  not necessary.

>>> 2. mask:
>>> Ogg files are self-contained.  This proposal breaks that in a huge way,
>>> and I think it's terrible.  The right way to do this is in CSS in the
>>> webpage, a la
>>> http://labs.silverorange.com/files/video-demo/ambient.xhtml
>>> http://webkit.org/blog/181/css-masks/
>>> Please remove mask from the draft.
>> Yes, that is another train of thought. We indeed do not need the
>> functionality for the Web. But what about media players? Other media
>> format allow for inclusion of such a mask inside the media resource to
>> allow masking the video display.
> They do?  Can you point me a to media player other than a web browser that
> can play statically-masked video?

QuickTime: http://docs.info.apple.com/article.html?artnum=42623&coll=cp

Flash: http://www.adobe.com/designcenter/tutorials/fla8at_maskvideo/ -
though it works a bit different with Flash

>> This is an attempt at introducing
>> this functionality into Ogg. I won't fight for it if the general
>> consensus is: we don't need it. But I have had this discussion that
>> e.g. Flash and MPEG are capable of this and Ogg isn't. This would be a
>> relatively simple way to introduce it.
> A static binary mask image cannot reproduce the full-motion alpha behavior
> of VP6a (Flash) and H.264 (MPEG).

That's not what it's for - it's to remove certain pixels from playback
and turn them fully transparent. It's meant to be static.

>>> 3. transparentcolor.
>>> This will not work.
>> Now, how would you cut out a person from the video? Would you need to
>> create a new track (the "video/alpha" video track) that provides the
>> continuing mask over the person and makes everything around that mask
>> transparent? Since we don't have alpha channels in Ogg, this would be
>> a means to introduce alpha channels.
> Yes, that's what I imagined.  I assumed this was the purpose for which you
> created the video/alpha Role.

Yeah, I think that's what it was for - it's an earlier discussion that
I hadn't re-considered when writing the Wiki page.

>>> For video/alpha, this is still insufficient, because masking a video and
>>> an overlay before compositing them is not the same as masking after
>>> compositing.  To permit masking after compositing, video/alpha tracks
>>> should optionally have one or more Altitudes.  For each Altitude held by a
>>> video/alpha track, it applies to the composited result of all visible
>>> higher tracks.
>> Yes, I agree - the "video/alpha" approach is a hack and not a feature.
>> Is this even the best way to go about it? Would it make more sense to
>> change Theora to include possibility for an alpha channel?
> This is a tough question.  The "video/alpha" approach has the advantage
> that current non-transparency-aware players have a chance of falling back
> to playing the main video without transparency.  Adding an alpha channel
> to Theora would be possible, but the resulting tracks wouldn't play at all
> in any current player; it would essentially be a different codec,
> requiring new encoders and decoders.

... unless Theora was to be extended with bits at the end that
current players would ignore (if such was possible).

>  As a result, I favor the
> "video/alpha" approach, even though it's messier and probably less
> efficient.  I'd like to hear more opinions on that topic.

Yeah, me too.


More information about the theora-dev mailing list