[foms] Proposal: adaptive streaming using open codecs

Mon Nov 8 06:30:56 PST 2010

On Nov 8, 2010, at 4:37 AM, Chris Pearce wrote:

>>> I think some kind of model where the decoding pipeline gets passed 
>>> keyframe-aligned byte-ranges from possibly different resources seems 
>>> reasonable. We'd probably not want the media data from the chunks to be 
>>> exposed to JS, we'd be better off passing "handles" to chunks around in 
>>> JS instead.
>>> 
>> 
>> 
>> What do you think of the scheme that Jeroen proposed where the "handles" are ( URL, byte range ) pairs ?
> 
> 
> I think the Jeroen's appendVideo() proposal is a good starting point for discussion. I'm happy with a bufferLevel attribute, but I think browsers should only honour it if we're in readyState HAVE_ENOUGH_DATA. Living in New Zealand, I very frequently have to pause YouTube videos to allow them to mostly download in order to play them through without stopping, and I don't want to stop people from pre-buffering when necessary.
> 
> I'd prefer to make manual stream switching opt-in, and not have an explicit appendVideo() call. I'd rather have JS periodically poll the state of the download (we can make available whatever data/statistics you need), and have JS call setLevel() to force stream changes. The browser can then switch the stream as soon as it's able to. That seems like less work by fewer people in the long run.

I completely agree with that. The appendVideo() call would be mainly to bridge the gap between what we have right now and a "full" bitrate switching functionality. At a later stage, browsers can implement their own adaptive streaming, e.g. by extending the scope of the "sources" tags or by loading a (by then agreed upon) manifest file format. I think we can all agree that's a bit further away though.

> I don't like the idea of an appendVideo() function which specifies a byte range because:
> 	• I think that browsers wouldn't want to initiate a new byte range request for every appendVideo() call anyway, they'd want to request the entire media (probably with a startOffset to EOF BRR), in order to reduce delays and overhead caused by setting up many smallish BRR.
> 	• If appendVideo() could specify a byte range, it could specify an invalid range, or a range which doesn't chronologically follow the previously appended range. Everyone who wants to do manual stream switching has to get this right, but if it's implemented in-browser, it only needs to be gotten right once.
> 	• It's easier for a browser to get the transition between two streams consistently seamless than it is for user JS to, since we have more information and more control over the network and decode.
> 
> I'd much rather that the browser fetched the indexes of all streams on startup (probably asynchronously) and played a default stream as per normal. If manual switching was enabled, then upon a call to setLevel(), the video would switch to the requested stream at an appropriate time. We'd probably want to start rendering the new stream as soon as possible, as we could be switching down due to dropping too many frames due to the decode not keeping up with rendering.

I agree there's a certain amount of ambiguity between lower level range-requests and the un-availability of seekpoints and decoder settings. What about a slightly modified version of appendVideo()?

videoElement.appendVideo(videoURL,[startPosition,endPosition]);

*) Both the startPosition and endPosition are in seconds and optional. If they are both omitted, this function is basically a way to build seamless playlist playback (as Silvia requested).
*) Like with today's playback,  the browser decides to probe the header data if it doesn't have enough info (yet) on the file. In other words, when building a bitrate switching API with this, the browser fetches the index the first time (part of) this videoURL is requested.
*) The javascript layer needs to take care of knowing all seekpoints if bitrate switching is built with this logic. The browser will merely translate the startPosition and endPosition to the nearest seekpoints (no partial GOP decode voodoo for now).
*) As the appended video metadata is fetched and the chunk duration is known, the ''duration'' and the "seekable" properties of the stream change. A "durationchange" event is fired. The ''readyState'' is re-evaluated (it may e.g. jump from HAVE_FUTURE_DATA to HAVE_CURRENT_DATA)
*) As playback rolls into the new videoURL, the ''videoWidth'' and ''videoHeight'' of the stream are updated (though they may remain the same).
*) If the metadata of the appended video cannot be probed, the browser throws an error (MEDIA_ERR_SRC_NOT_SUPPORTED) and does not append the video. This means that, during append-metadata-fetching, the "readyState" of a video does not change.
*) Appended videos should be in the same container format and A/V codec than the currently playing video. If not, the browser will throw an error and not append the video (MEDIA_ERR_APPEND_NOT_ALIGNED). This restriction is added to ensure appended content can be decoded within a single pipeline. 
*) Videostream bitrate, dimensions and framerate may vary. Audiostream bitrate, channels and samplefrequency may also vary. A single decoding pipeline can handle these variations.
*) Buffering and metadata availability will still happen inside the browser. When an appendVideo() is fired, the browser will typically rush to load the metadata ánd the media time range (as it does today).

This seems to be a good tradeoff and starting point to experiment with adaptive streaming?

Kind regards,

Jeroen