[foms] WebM Manifest

Thu Apr 7 09:19:08 PDT 2011

On Apr 7, 2011, at 8:34 AM, Philip Jägenstedt wrote:

> On Thu, 17 Mar 2011 21:10:46 +0100, Jeroen Wijering  
> <jeroen at longtailvideo.com> wrote:
> 
>> 
>> On Mar 17, 2011, at 1:47 PM, Philip Jägenstedt wrote:
>> 
>>>> Next, the Stream API needs to be very strictly defined in terms of how  
>>>> provided A/V frames should be formatted, and how and when codec  
>>>> initialization data must be (re)sent.
>>>> 
>>>> Basically, javascript handles the demuxing. This would be a great API,  
>>>> allowing for much flexibility. At the same time, the amount of  
>>>> knowledge required for such an API would be so staggering (e.g. full  
>>>> understanding of video containers) that few people would be able to  
>>>> work with it.
>>> 
>>> I may very well be in need of education, but I don't see why that needs  
>>> to be the case.
>>> 
>>> Assume a manifest at its simplest is a list of URLs and switchover  
>>> times. If one has a "manifest API" that allows one to add URLs and  
>>> switchover times, then surely anything that can be done with a manifest  
>>> can be done with the API? If a manifest solution doesn't require  
>>> inspecting the data outside of the normal decoding, why would it be  
>>> necessary when one uses an API?
>> 
>> 
>> When portraying a manifest as a "list of URLs" you are following an  
>> approach similar to Apple HLS, imposing two restrictions on your  
>> datamodel:
>> 
>> 1. Interleaving. Only when audio + video are interleaved in fragments,  
>> you can have a list of URLs. Your presentation is basically chopped up  
>> vertically (time) instead of horizontally (stream). In any case where  
>> you have more than one quality level of a certain track, there is data  
>> duplication. Sometimes (5 video qualities with the same audio, like  
>> Apple HLS) that might be acceptable. Sometimes (5 video qualities with 5  
>> audio languages) the amount of data simply explodes.
>> 
>> 2. Initialization. Only when every fragment is self-initializing (every  
>> fragment contains all codec configuration), you can have a list of URLs.  
>> After all, the javascript layer should be able to randomly start with  
>> any fragment and do random subsequent switching. Every container format  
>> has its peculiarities that makes this amount of data not trivial - e.g.  
>> Vorbis initialization requires a couple of kB.
>> 
>> These restrictions can probably be worked around (A+V buffer,  
>> initialization segments), but they do complicate things - both for the  
>> browser media developer and for the javascript developer. You are right  
>> - full video container knowledge is probably not required by  
>> javascripters, but things do go deeper than urls+= if aiming for a more  
>> flexible approach than Apple HLS.
> 
> OK, so perhaps an API with the functionality we need would be a better  
> approach. This is similar to how <track> (captions) is handled, while  
> there exists a simple baseline format (WebVTT), scripts can do fancy stuff  
> using the API. For adaptive streaming, specific advantages of bringing  
> scripts into the mix is allowing experimentation on the rate switching  
> algorithms and allowing site-specific schemes for URL patterns with live  
> streaming that does away with the need for ever re-fetching a manifest or  
> having a complex manifest for declaratively giving the URL pattern.

The goal of implementing rate adaptation algorithms in Javascript is a great one, but it's a real challenge to imagine an API simple enough to be widely accepted yet powerful enough to enable meaningful experimentation.

It really isn't as simple as periodically monitoring incoming data rate and making "switch up" and "switch down" decisions. Incoming data rate is not just a kbps number: even considering only the simplest things you can measure over a variety of timescales using a variety of averaging functions (sliding window, exponentially weighted moving average). "switch up" and "switch down" are not the only actions and anyway the decisions need to be made with knowledge of the switch points and the actual variable bitrate profiles of the streams into the future. If you make the API too simple you restrict the possible algorithms and so restrict experimentation.

We are working on a C++ API which exposes all the information needed in a low level, but abstracted way, so that we can build pluggable rate adaptation algorithms for experimentation. I hope that in time we'll develop enough understanding of the moving parts to simplify that API in a way that doesn't restrict the algorithms and perhaps makes it possible to expose things the Javascript.

But that would be after a manifest-based approach with rate adaptation in the underlying C++ code has been deployed, since we need that deployment to do the initial experimentation.

Btw, for on-demand you don't need URL lists or patterns at all. It makes much more sense to put each stream (audio, video etc.) in one file and navigate using HTTP byte range requests.

...Mark

> 
> -- 
> Philip Jägenstedt
> Core Developer
> Opera Software
> _______________________________________________
> foms mailing list
> foms at lists.annodex.net
> http://lists.annodex.net/cgi-bin/mailman/listinfo/foms