[foms] Adaptive streaming

Tue Oct 26 01:25:45 PDT 2010

On Oct 25, 2010, at 1:15 AM, Mark Watson wrote:

> Firstly, thanks to Sylvia for directing me towards this list. I am at Netflix, looking at how our service could eventually be delivered to standards-based adaptive streaming players. We're keen to make sure the emerging standards/open solutions have the capabilities that would be needed for that. We have been doing HTTP adaptive streaming on quite a large scale for a few years now with entirely proprietary technology, but in the future it is more important to us to make it easier to get our service onto more devices more easily than to keep that technology to ourselves. We'd love to see an open, high-quality adaptive streaming solution are are willing to help make that happen. 

Great! I think that's why we're all here ;)

> Reading the "Proposal: adaptive streaming" thread in the archives, I have a couple of comments.
> 
> Firstly, it isn't really necessary to split content into physical chunks for on demand services. And there are some real disadvantages to doing that. We have found the "granularity of request" needs to be of the order of 2s to adapt fast enough when conditions change. 10s is too long. Storing content in 2s chunks results in lots of files (it would be literally billions, for us, considering the size of our library).

The smaller the chunks, the faster one can react on changing conditions. It's a play between chunk size and bufferlength. Longer chunks mean less overhead (manifest size, HTTP requests) and smaller ones mean faster switching (less chance of running into buffer issues). 2 to 5 seconds seem to be good averages in our experience. 

> The alternative is use of HTTP Range requests, with which we've had no problems in terms of support in devices, servers and CDNs. Store the movie as a single file accompanied with a compact index which enables clients to form range requests for arbitrary sized pieces, down to a single GoP. This also has the advantage that client requests do not need to always be the same size (in time).

Yes, either a range-request based or slicing-webserver-module based solution is preferred over actual chunks for larger libraries. 

The nice thing about also supporting separate chunks is that the barrier for getting started with adaptive HTTP is very low. One only needs an encoder or a segmenter tool; everything else is basic components. We see this now with Apple HTTP - it's the format that's getting traction where Flash/Silverlight are not. Especially on the live side.

> As the proposal says, server side logic could translate client "chunk" requests into byte ranges, but to be efficient this process needs to be understood by caches as well as origin servers: CDN caches can (and do) prefetch the "next" part of a file following a range request, which they won't do if they just see individual chunks. It's good if the solution can work with existing HTTP infrastructure.

I don't understand this part. Is it good or bad the "next" part is fetched? 

> This approach also keeps the manifest compact: if the manifest has to list a separate URL for every GoP it can get quite large with a 2h piece of content. Even after gziping, the size is sufficient to affect startup time (any system being designed now should be targeting ~1s startup, IMO).

The issue is there nontheless, right? Either the index is in the manifest or it is in the movie. In both situations, they have to be fully loaded in order to provide random access. Or is there another way?

In a situation where one uses range requests, a range-req needs to be done on all quality levels to grab the full index, in order to find the seekpoints, correct?

> Another important factor is separation of audio, video and subtitle streams. The number of combinations gets pretty large with only a few audio/subtitle languages and video bitrates.

Yes, that is something we haven't talked about a lot. At the workshop, the idea was to allow for different audio bitrates / nr. of channels  per quality level, but still have A/V in one stream. This is not ideal for multi-language solutions.

> We've been working in MPEG on the DASH standard which just reached the "Committee Draft" milestone. Unlike traditional MPEG work items there are a core group of participants who understand that this needs to be done quickly and without excessive complexity (otherwise we probably wouldn't be interested). It is more complex than m3u8, but it supports a lot more features, not all of which are unnecessary ;-) We expect to see a simple profile defined that cuts out the more esoteric stuff.
> 
> I wondered what the opinion of the group here was on that work ?

I must say the full spec is rather extensive and overwhelming. I wonder which part of it will get implemented by a wide range of clients. It reminded me a bit of SMIL, which can also do a lot, but is limitedly used in practice. 

A "simple" profile, e.g. based upon the example you attached, makes a lot of sense. That one is readeable and fairly easy to use/implement. 

Does DASH also allow full listings of media fragments (for e.g. HTTP Live / Smooth Streaming style chunk requests), or are range requests the only way to get chunks? 

Also (this might be another discussion), are there thoughts on standardizing the subtitling format? 

Kind regards,

Jeroen