[foms] Proposal: adaptive streaming using open codecs
jeroen at longtailvideo.com
Wed Oct 27 03:35:04 PDT 2010
Hey Mark, all,
On Oct 26, 2010, at 8:41 PM, Mark Watson wrote:
>>>> 1. A concatenation API (maybe Stream) to form a single stream from multiple URLs. This would basically be a byte concatentation API, and assumes that we either have the chunks be plain slices or that we support chained Ogg/WebM gaplessly. It has some similarity to a Manifest API in that it lists several URLs. The difference may be that the video element isn't aware of the multiple resources, that's all hidden in the URL, effectively made part of the network layer of the browser.
>>> Basically an API that says "Play this chunk of video next"? I think that's what I've pushed for, but it's a decent amount of work. I'm not sure what the rules are for that esp. wrt sound sync. Also I don't think it has to be byte-concatination if we have decent support for moving from one video to the next on a frame-by-frame basis.
>> I have added a small section on this to the proposal I drafted. I also posted it up on the WhatWG wiki:
>> Please feel free to add/edit/remove as you see fit. There's still a lot of wrong statements in there, or omissions of feedback or alternatives. I added in a bunch based off the emails over last week, but some sections (particularly around chaining/chunking and to-rangerequest-or-not) as still very weak.
>> On the audio concatenation: can the suggestion that Monty put forward in the workshop (making up additional sound data in Vorbis e.g. for a crossfade) also be used for other codecs? Or is this something that can only be done in Vorbis?
>> Chris' idea on the video concatenation sounds good - this can be on a frame-by-frame basis. I presume one then can still used only one decoding pipe? Or is that an issue then?
> But I think you need to drive it based on what is happening on the network. Otherwise how do I know how many chunks to "append". If I append too many and network conditions change, then I could stall. If I append too few then again I could stall.
> (a) current buffer level (i.e. amount of received but not played media, in playout time)
Yes indeed, thanks! That's a total miss. There needs to be a getter for the amount of video in the buffer, otherwise you don't know what's going on. I'll add that (when the WhatWG database is back up ;).
> (b) recently observed incoming bandwidth
That is part of the QOS metrics section. The "append" API is indeed depending upon that to be useful.
You have ideas for additional metrics? Is Netflix considering other data for its heuristics beside these?
That would be nice as well indeed, but I'm afraid too much work for browser vendors as a first try (right?). Especially given the uncertainty around which manifest format to use.
> It would be really great if the whole thing could run independently for audio and video. They can be completely decoupled for streaming and synchronized at the renderer.
I'd imagine both audioElement and videoElement have this "append" call.
I did some quick tests with trying to keep a video and an audio in sync (for closed audiodescriptions). You have to pay attention around buffering, but once you have sync it works great.
In time, it'd be awesome if one could feed a videoElement e.g. a full DASH manifest with separate video, audio and text tracks. But that'll probably be a couple of iterations away...
More information about the foms