[foms] WebM Manifest

Thu Mar 17 13:10:46 PDT 2011

On Mar 17, 2011, at 1:47 PM, Philip Jägenstedt wrote:

>> Next, the Stream API needs to be very strictly defined in terms of how provided A/V frames should be formatted, and how and when codec initialization data must be (re)sent.
>> 
>> Basically, javascript handles the demuxing. This would be a great API, allowing for much flexibility. At the same time, the amount of knowledge required for such an API would be so staggering (e.g. full understanding of video containers) that few people would be able to work with it.
> 
> I may very well be in need of education, but I don't see why that needs to be the case.
> 
> Assume a manifest at its simplest is a list of URLs and switchover times. If one has a "manifest API" that allows one to add URLs and switchover times, then surely anything that can be done with a manifest can be done with the API? If a manifest solution doesn't require inspecting the data outside of the normal decoding, why would it be necessary when one uses an API?

When portraying a manifest as a "list of URLs" you are following an approach similar to Apple HLS, imposing two restrictions on your datamodel:

1. Interleaving. Only when audio + video are interleaved in fragments, you can have a list of URLs. Your presentation is basically chopped up vertically (time) instead of horizontally (stream). In any case where you have more than one quality level of a certain track, there is data duplication. Sometimes (5 video qualities with the same audio, like Apple HLS) that might be acceptable. Sometimes (5 video qualities with 5 audio languages) the amount of data simply explodes.

2. Initialization. Only when every fragment is self-initializing (every fragment contains all codec configuration), you can have a list of URLs. After all, the javascript layer should be able to randomly start with any fragment and do random subsequent switching. Every container format has its peculiarities that makes this amount of data not trivial - e.g. Vorbis initialization requires a couple of kB.

These restrictions can probably be worked around (A+V buffer, initialization segments), but they do complicate things - both for the browser media developer and for the javascript developer. You are right - full video container knowledge is probably not required by javascripters, but things do go deeper than urls+= if aiming for a more flexible approach than Apple HLS.

Kind regards,

Jeroen