[foms] Proposal: adaptive streaming using open codecs

Sat Nov 20 06:48:14 PST 2010

On Nov. 19 2010 at 18:37, Mark Watson wrote :

>> The error margin may be quite high, especially when using VBR encoding and not equally-time-spaced keyframes, don't you think?
> 
> Yes, the error could be as big as the maximum keyframe spacing.

I think it's more linked to the variance of the keyframes spacing somehow.

>> Very interesting indeed, but aren't we up to re-invent SMIL all the way down here? ;-)
> 
> Not exactly, but it's worth discussing why we don't just use SMIL for all of this and call it a day. This was discussed in some detail in 3GPP
> last year. The feature set of SMIL and the feature set needed for adaptive streaming do intersect, but there are a few things we need for
> adaptive streaming which are not in SMIL and a LOT of things in SMIL which are not needed for adaptive streaming. 
> 
> The intersect is also rather awkward. You can use <par> to define alternatives. And <seq> to define chunks. But you end up duplicating the
> set of <par> elements in each element of the <seq>. And there is no semantic linkage between the different versions in each time period
> (nothing in SMIL implies there is any relationship between the first alternative in one time period and the first alternative in the next time
> period.) It's also not clear what to do in SMIL about the fact that the audio and video in an interleaved chunk end at slightly different times,
> and begin at slightly different times in the next chunk to compensate. An out-of-the-box SMIL player might not play that seamlessly.

Agreed.

> Yes, you can shoehorn it in, but its awkward and verbose and not at all clear that existing SMIL players would "do the right thing". Modifying
> an existing player is not likely to be any easier than a from-scratch adaptive streaming implementation, given the complexity of SMIL (and the
> comparative simplicity of adaptive streaming).
> 
> I think we should be thinking of adaptive streaming as just another stream type which appears to any presentation layer (HTML5, SMIL, whatever)
> as a simple audio/video stream with some switchable properties (like audio language, subtitles etc) and hides the complexities of adaptivity and
> switching from the presentation layer. Switching, splicing, chunking, bitrates etc. are all things that should worry us video geeks and we shouldn't
> concern the presentation design people with these things.

Probably not, but for us video geeks, also having access to the A/V pipeline may prove interesting for experimenting with bitrate switching algorithms
(the same way it's done in Flash, Silverlight, and some other platforms).

Another question that is not directly related to the discussion above: do you have public statistics on what you gained by chunking the different streams,
i.e. do you measure the impact of chunks caching within the network when distributing your content at Netflix? In other terms, what percentage of your
origin egress bandwidth (incl. your ow servers and any first-level CDN you may use) is saved by having these chunks cached at different levels of the
Internet topology?

Pierre-Yves