[foms] Proposal: adaptive streaming using open codecs

Wed Oct 27 03:35:04 PDT 2010

Hey Mark, all,

On Oct 26, 2010, at 8:41 PM, Mark Watson wrote:

>>>> 1. A concatenation API (maybe Stream) to form a single stream from multiple URLs. This would basically be a byte concatentation API, and assumes that we either have the chunks be plain slices or that we support chained Ogg/WebM gaplessly. It has some similarity to a Manifest API in that it lists several URLs. The difference may be that the video element isn't aware of the multiple resources, that's all hidden in the URL, effectively made part of the network layer of the browser.
>>>> 
>>> 
>>> Basically an API that says "Play this chunk of video next"?  I think that's what I've pushed for, but it's a decent amount of work.  I'm not sure what the rules are for that esp. wrt sound sync.  Also I don't think it has to be byte-concatination if we have decent support for moving from one video to the next on a frame-by-frame basis.
>> 
>> I have added a small section on this to the proposal I drafted. I also posted it up on the WhatWG wiki:
>> 
>> http://wiki.whatwg.org/wiki/Adaptive_Streaming#API_adaptive_streaming
>> 
>> Please feel free to add/edit/remove as you see fit. There's still a lot of wrong statements in there, or omissions of feedback or alternatives. I added in a bunch based off the emails over last week, but some sections (particularly around chaining/chunking and to-rangerequest-or-not) as still very weak. 
>> 
>> On the audio concatenation: can the suggestion that Monty put forward in the workshop (making up additional sound data in Vorbis e.g. for a crossfade) also be used for other codecs? Or is this something that can only be done in Vorbis? 
>> 
>> Chris' idea on the video concatenation sounds good - this can be on a frame-by-frame basis. I presume one then can still used only one decoding pipe? Or is that an issue then?
> 
> I think this kind of API is a great idea - the adaptation decisions are a key area for experimentation and innovation and if you can do that in Javascript it would be great.
> 
> But I think you need to drive it based on what is happening on the network. Otherwise how do I know how many chunks to "append". If I append too many and network conditions change, then I could stall. If I append too few then again I could stall.
> 
> Instead, the Javascript code could get called back each time a chunk has been downloaded. This is the point at which you want to decide whether and what chunk to request next. In the simplest case the information you need for this decision is
> (a) current buffer level (i.e. amount of received but not played media, in playout time)

Yes indeed, thanks! That's a total miss. There needs to be a getter for the amount of video in the buffer, otherwise you don't know what's going on. I'll add that (when the WhatWG database is back up ;).

> (b) recently observed incoming bandwidth

That is part of the QOS metrics section. The "append" API is indeed depending upon that to be useful.

> However, you might find that with this limited information there are not many adaptation algorithms you can actually build and so not much scope for experimentation. If you want to do more what you need is more information about the observed network conditions. For example a trace of number of bytes received in each 1s (or 100ms) interval since the last callback. The Javascript can then choose its own bandwidth measures/filters/heuristics etc.

The QOS section defines a getter for bandwidth (bytes received per second), frames dropped, height and width. A typical javascript-based heuristics algorithm would poll and store these metrics in an array or so. With that data, you could do all kinds of stuff in terms of e.g. averaging things out, building treshold filters or periodically blacklisting levels.

You have ideas for additional metrics? Is Netflix considering other data for its heuristics beside these?

> There could also be an "intermediate" version of this API in which the player *does* know about manifests etc. and is just asking the Javascript to choose one of the available bitrates for the next request. This way the player manages everything related to determining supported codecs, file formats, scheduling of requests onto TCP connections etc. In this case the Javascript needs to be told the choices including some notion of the available bitrates (which needs to be some kind of peak measure - average is not very useful).

That would be nice as well indeed, but I'm afraid too much work for browser vendors as a first try (right?). Especially given the uncertainty around which manifest format to use.

> It would be really great if the whole thing could run independently for audio and video. They can be completely decoupled for streaming and synchronized at the renderer.

I'd imagine both audioElement and videoElement have this "append" call. 

I did some quick tests with trying to keep a video and an audio in sync (for closed audiodescriptions). You have to pay attention around buffering, but once you have sync it works great.

In time, it'd be awesome if one could feed a videoElement e.g. a full DASH manifest with separate video, audio and text tracks. But that'll probably be a couple of iterations away...

Kind regards,

Jeroen