[foms] Proposal: adaptive streaming using open codecs
watsonm at netflix.com
Mon Nov 1 18:58:38 PDT 2010
On Nov 1, 2010, at 5:51 PM, Andy Berkheimer wrote:
> On Mon, Nov 1, 2010 at 7:46 PM, David R <videophool at hotmail.com> wrote:
>> On Tue, 2 Nov 2010, Sylvia wrote:
>>> I was merely referring to the need to support both models in a HTML5-only future with adaptive streaming.
>> Supporting both models is fine, but they are very different use cases. Maybe I misunderstood the original post, but it seems that the use cases are being mixed. IMHO, the adaptive streaming spec should not address non-adaptive streaming use cases.
> I _think_ we're talking about a small set of components which can be
> added to browsers and composed to address most streaming use cases.
> To me the fundamental principle is to think of media as a series of
> resources rather than a single resource. Everything else starts to
> fall out from there.
I think the fundamental principle of adaptive streaming is that you have multiple streams available (multiple audio, multiple video etc.) which are all guaranteed to be precisely locked to the same timeline. This is the fundamental principle which permits switching seamlessly between streams.
It's helpful if the Random Access Points in one version line up with good "stopping points" in other versions, so you can switch without downloading overlapping data for different versions and can easily feed non-overlapping chunks into the playback pipeline. At least in my view this is very helpful. Apple's HTTP Live Streaming does not work this way.
But there are good reasons to decouple this level of chunking (the "unit of request") from the unit of storage (resources identified by URLs), as discussed earlier on this list. Except for Live there is no good reason to split the actual resources into physical chunks (separate files).
> * An interface to feed a stream into a media element as a series of
> concatenated resources ('chunks') rather than a single resource.
> Client runtime exposing buffering behavior controls. Support changing
> some parameters (clearly define what can and cannot change:
> resolution, number of channels, etc) from one chunk to the next.
> Re: Chris Blizzard's WebSocket proposal - quite interesting, though
> I'm concerned about how that would work with third party CDN oriented
> [ open question around: feeding video and audio into a media element
> separately, with the element performing synchronization. Desirable
> and powerful, but very hard to splice into most existing
> implementations without a significant revamp... ]
This is where the fundamental principle of time alignment kicks in. If you start with integrating that principle then separate audio and video feeds, with synchronization at the renderer, should not be a problem.
> * An interface exposing performance metrics (e.g. dropped frame
> counters, network read bandwidth) out of the appropriate elements.
> * A manifest format describing the list of resources to be
> concatenated along with any other metadata or bootstrap information
> needed by the client application logic. May include multiple lists of
> resources at different bitrates representing the same presentation.
> * A chunk format defining how the chunk resources are structured.
> Ideally you have the property that 'cat chunk1 ... chunkX > allchunks'
> reproduces the original media resource.
> Also acceptable, each chunk can take a contiguous slice of an
> unchunked media resource and prepend a small header. This header
> contains extra information to optimize processing in the client
> application and/or make the chunk a valid media resource in its own
Alternatively, you can just take the original media file and provide the client with the time and byte ranges corresponding to good chunks.
> With these simple blocks you can implement fully automatic adaptive
> streaming, or make user-initiated stream changes (resolution,
> language, captions, camera angle, etc) smoother, or non-adaptive
> streaming with minimal buffering, or non-adaptive streaming with
> aggressive buffering, etc.
Right. And exactly the components described above, including the alternatives I've described, have been worked on for quite a while now. In MPEG there's been a lot of discussion of the additional features needed to support multiple languages, subtitles, additional audio/video streams for accessibility, camera angles, 3D and more. It boils down to annotating the available streams and letting the client choose and download the combination it wants.
> foms mailing list
> foms at lists.annodex.net
More information about the foms