[foms] WebM Manifest

Thu Apr 7 08:34:31 PDT 2011

On Thu, 17 Mar 2011 21:10:46 +0100, Jeroen Wijering  
<jeroen at longtailvideo.com> wrote:

>
> On Mar 17, 2011, at 1:47 PM, Philip Jägenstedt wrote:
>
>>> Next, the Stream API needs to be very strictly defined in terms of how  
>>> provided A/V frames should be formatted, and how and when codec  
>>> initialization data must be (re)sent.
>>>
>>> Basically, javascript handles the demuxing. This would be a great API,  
>>> allowing for much flexibility. At the same time, the amount of  
>>> knowledge required for such an API would be so staggering (e.g. full  
>>> understanding of video containers) that few people would be able to  
>>> work with it.
>>
>> I may very well be in need of education, but I don't see why that needs  
>> to be the case.
>>
>> Assume a manifest at its simplest is a list of URLs and switchover  
>> times. If one has a "manifest API" that allows one to add URLs and  
>> switchover times, then surely anything that can be done with a manifest  
>> can be done with the API? If a manifest solution doesn't require  
>> inspecting the data outside of the normal decoding, why would it be  
>> necessary when one uses an API?
>
>
> When portraying a manifest as a "list of URLs" you are following an  
> approach similar to Apple HLS, imposing two restrictions on your  
> datamodel:
>
> 1. Interleaving. Only when audio + video are interleaved in fragments,  
> you can have a list of URLs. Your presentation is basically chopped up  
> vertically (time) instead of horizontally (stream). In any case where  
> you have more than one quality level of a certain track, there is data  
> duplication. Sometimes (5 video qualities with the same audio, like  
> Apple HLS) that might be acceptable. Sometimes (5 video qualities with 5  
> audio languages) the amount of data simply explodes.
>
> 2. Initialization. Only when every fragment is self-initializing (every  
> fragment contains all codec configuration), you can have a list of URLs.  
> After all, the javascript layer should be able to randomly start with  
> any fragment and do random subsequent switching. Every container format  
> has its peculiarities that makes this amount of data not trivial - e.g.  
> Vorbis initialization requires a couple of kB.
>
> These restrictions can probably be worked around (A+V buffer,  
> initialization segments), but they do complicate things - both for the  
> browser media developer and for the javascript developer. You are right  
> - full video container knowledge is probably not required by  
> javascripters, but things do go deeper than urls+= if aiming for a more  
> flexible approach than Apple HLS.

OK, so perhaps an API with the functionality we need would be a better  
approach. This is similar to how <track> (captions) is handled, while  
there exists a simple baseline format (WebVTT), scripts can do fancy stuff  
using the API. For adaptive streaming, specific advantages of bringing  
scripts into the mix is allowing experimentation on the rate switching  
algorithms and allowing site-specific schemes for URL patterns with live  
streaming that does away with the need for ever re-fetching a manifest or  
having a complex manifest for declaratively giving the URL pattern.

-- 
Philip Jägenstedt
Core Developer
Opera Software