[foms] Proposal: adaptive streaming using open codecs

Tue Nov 9 01:22:23 PST 2010

On Mon, 08 Nov 2010 15:39:59 +0100, Jeroen Wijering  
<jeroen at longtailvideo.com> wrote:

> Hello all,
>
>>> I don't like the idea of an appendVideo() function which specifies a  
>>> byte range because:
>>> 	• I think that browsers wouldn't want to initiate a new byte range  
>>> request for every appendVideo() call anyway, they'd want to request  
>>> the entire media (probably with a startOffset to EOF BRR), in order to  
>>> reduce delays and overhead caused by setting up many smallish BRR.
>>> 	• If appendVideo() could specify a byte range, it could specify an  
>>> invalid range, or a range which doesn't chronologically follow the  
>>> previously appended range. Everyone who wants to do manual stream  
>>> switching has to get this right, but if it's implemented in-browser,  
>>> it only needs to be gotten right once.
>>> 	• It's easier for a browser to get the transition between two streams  
>>> consistently seamless than it is for user JS to, since we have more  
>>> information and more control over the network and decode.
>>>
>>> I'd much rather that the browser fetched the indexes of all streams on  
>>> startup (probably asynchronously) and played a default stream as per  
>>> normal. If manual switching was enabled, then upon a call to  
>>> setLevel(), the video would switch to the requested stream at an  
>>> appropriate time. We'd probably want to start rendering the new stream  
>>> as soon as possible, as we could be switching down due to dropping too  
>>> many frames due to the decode not keeping up with rendering.
>>
>> I agree there's a certain amount of ambiguity between lower level  
>> range-requests and the un-availability of seekpoints and decoder  
>> settings. What about a slightly modified version of appendVideo()?
>>
>> videoElement.appendVideo(videoURL,[startPosition,endPosition]);
>>
>> *) Both the startPosition and endPosition are in seconds and optional.  
>> If they are both omitted, this function is basically a way to build  
>> seamless playlist playback (as Silvia requested).
>> *) Like with today's playback,  the browser decides to probe the header  
>> data if it doesn't have enough info (yet) on the file. In other words,  
>> when building a bitrate switching API with this, the browser fetches  
>> the index the first time (part of) this videoURL is requested.
>> *) The javascript layer needs to take care of knowing all seekpoints if  
>> bitrate switching is built with this logic. The browser will merely  
>> translate the startPosition and endPosition to the nearest seekpoints  
>> (no partial GOP decode voodoo for now).
>> *) As the appended video metadata is fetched and the chunk duration is  
>> known, the ''duration'' and the "seekable" properties of the stream  
>> change. A "durationchange" event is fired. The ''readyState'' is  
>> re-evaluated (it may e.g. jump from HAVE_FUTURE_DATA to  
>> HAVE_CURRENT_DATA)
>> *) As playback rolls into the new videoURL, the ''videoWidth'' and  
>> ''videoHeight'' of the stream are updated (though they may remain the  
>> same).
>> *) If the metadata of the appended video cannot be probed, the browser  
>> throws an error (MEDIA_ERR_SRC_NOT_SUPPORTED) and does not append the  
>> video. This means that, during append-metadata-fetching, the  
>> "readyState" of a video does not change.
>> *) Appended videos should be in the same container format and A/V codec  
>> than the currently playing video. If not, the browser will throw an  
>> error and not append the video (MEDIA_ERR_APPEND_NOT_ALIGNED). This  
>> restriction is added to ensure appended content can be decoded within a  
>> single pipeline.
>> *) Videostream bitrate, dimensions and framerate may vary. Audiostream  
>> bitrate, channels and samplefrequency may also vary. A single decoding  
>> pipeline can handle these variations.
>> *) Buffering and metadata availability will still happen inside the  
>> browser. When an appendVideo() is fired, the browser will typically  
>> rush to load the metadata ánd the media time range (as it does today).
>>
>> This seems to be a good tradeoff and starting point to experiment with  
>> adaptive streaming?

If we are trying to make something simple to start experimenting with, I  
don't think we need to make all of these changes to the HTMLMediaElement  
API. Having the media backend be aware of several resources simultaneously  
complicates things and will be more difficult (or impossible) to support  
on platforms where we don't have full control over the media framework.

I still strongly prefer to put the chunking+concatenating in a layer  
outside of <video>, by extending the suitably named Stream API, see  
<http://www.whatwg.org/specs/web-apps/current-work/multipage/commands.html#stream-api>.

Here's how it might work:

var s = new Stream();
s.appendURL('chunk1.webm');
s.appendURL('chunk2.webm');
video.src = s.url;

That's it. From the point of view of the video decoder, it's just an  
infinite stream. Properties:

* Running out of chunks is seen as a network stall and notified via the  
stalled event.

* Failure to build a correct stream is a decoding error, notified via an  
error event and MEDIA_ERR_DECODE, there's no need for  
MEDIA_ERR_APPEND_NOT_ALIGNED.

* No startPosition or endPosition needed.

* buffered, seekable, videoWith, videoHeight, etc are updates just as they  
would be with an infinite stream over HTTP.

* It's up to the application if they want to have chained WebM/Ogg (with  
full headers in each chunk), or just do raw chunks.

Clearly this is not where we want to eventually end up, but I think it's a  
good starting point. When we continue building with manifests and whatnot,  
I think we should continue to make sure that the decoding side doesn't  
need to be too involved.

> I'd also like to propose to re-enstate the videoElement.PROGRESS event  
> along with its "total" and "loaded" values. What do people from  
> Opera/Firefox think of this? I know it's been removed from the spec  
> because it bites with the buffered/player TimeRanges, but it ís exactly  
> what's needed to perform all kinds of bandwidth calculations. Perphaps a  
> tradeoff is to ping the offset/progress/total of the currently being /  
> last fetched timerange? That's what we do with Jw Player in Flash as  
> well for HTTP pseudo-streaming.

I don't want this to come back, the interface makes no sense. If the data  
must be made available, we should bring back the bufferedBytes property,  
which was removed from the spec by my own request since it didn't seem to  
fill any purpose, see  
<http://html5.org/tools/web-apps-tracker?from=2404&to=2405>.

> I'll also wrap this up - together with the appendVideo() call - and send  
> an email to WhatWG. Hopefully one or more browser developers are  
> interested in taking a stab at this.

So, I'm not a fan of appendVideo, I want the first generation of this API  
to be in the network layer, not in the <video> layer, especially not going  
all the way down to the decoding pipeline.

-- 
Philip Jägenstedt
Core Developer
Opera Software