[foms] Adaptive streaming
watsonm at netflix.com
Tue Oct 26 11:24:43 PDT 2010
On Oct 26, 2010, at 1:25 AM, Jeroen Wijering wrote:
On Oct 25, 2010, at 1:15 AM, Mark Watson wrote:
Firstly, it isn't really necessary to split content into physical chunks for on demand services. And there are some real disadvantages to doing that. We have found the "granularity of request" needs to be of the order of 2s to adapt fast enough when conditions change. 10s is too long. Storing content in 2s chunks results in lots of files (it would be literally billions, for us, considering the size of our library).
The smaller the chunks, the faster one can react on changing conditions. It's a play between chunk size and bufferlength. Longer chunks mean less overhead (manifest size, HTTP requests) and smaller ones mean faster switching (less chance of running into buffer issues). 2 to 5 seconds seem to be good averages in our experience.
MW> Yes, what I was pointing to was decoupling the "unit of request" (which needs to be small for fast reaction) from the "unit of storage" (which should ideally be larger for cache efficiency and general management).
The alternative is use of HTTP Range requests, with which we've had no problems in terms of support in devices, servers and CDNs. Store the movie as a single file accompanied with a compact index which enables clients to form range requests for arbitrary sized pieces, down to a single GoP. This also has the advantage that client requests do not need to always be the same size (in time).
Yes, either a range-request based or slicing-webserver-module based solution is preferred over actual chunks for larger libraries.
The nice thing about also supporting separate chunks is that the barrier for getting started with adaptive HTTP is very low. One only needs an encoder o
r a segmenter tool; everything else is basic components. We see this now with Apple HTTP - it's the format that's getting traction where Flash/Silverlight are not. Especially on the live side.
MW> For live, you do need small files. For on-demand it's a pain, and a compact time/byte index is a really simple thing to create and put into the file (mp4box already does it for DASH-style indexes). Actually I think the majority of adaptive streaming on the Internet today (by traffic volume) is done this way ;-)
As the proposal says, server side logic could translate client "chunk" requests into byte ranges, but to be efficient this process needs to be understood by caches as well as origin servers: CDN caches can (and do) prefetch the "next" part of a file following a range request, which they won't do if they just see individual chunks. It's good if the solution can work with existing HTTP infrastructure.
I don't understand this part. Is it good or bad the "next" part is fetched?
MW> It's good - it means that next part is ready on the local cache when the client requests it (which it frequently does). Delays for fetching from the origin server can be significant and can mean the difference between stalling or not (we've actually seen this on a large scale when using the "separate chunk files" approach).
This approach also keeps the manifest compact: if the manifest has to list a separate URL for every GoP it can get quite large with a 2h piece of content. Even after gziping, the size is sufficient to affect startup time (any system being designed now should be targeting ~1s startup, IMO).
The issue is there nontheless, right? Either the index is in the manifest or it is in the movie. In both situations, they have to be fully loaded in order to provide random access. Or is there another way?
One difference is that the manifests are generally some text format which even after gzip is much larger than a compactly encoded index. For example for DASH there are 12 bytes per chunk, 4 for the chunk size, 4 for the chunk duration and 4 giving the offset into the chunk (in time) of the first Random Access Point (which IMHO should always be zero ;-). This also zips quite well if the chunks are all the same duration.
The DASH indexes do support a "heirarchical" structure where you first download an "index of indexes" - for example the first index might have one entry for each 3 minute period of movie and point to byte ranges for the detailed indexes at the 2s level. Both the top level index and a 3 minute index probable fit into one packet. So two RTTs gets you started anywhere in the movie.
In a situation where one uses range requests, a range-req needs to be done on all quality levels to grab the full index, in order to find the seekpoints, correct?
It's a good idea to get the index for all quality levels, but you don't have to do it all up-front before you start to play. You could have a background process fetching the indices for the next few minutes of the movie so you're always prepared.
But in any case, whether you use manifests or indexes you need to get the information for at least the quality levels you think you might switch to.
Another important factor is separation of audio, video and subtitle streams. The number of combinations gets pretty large with only a few audio/subtitle languages and video bitrates.
Yes, that is something we haven't talked about a lot. At the workshop, the idea was to allow for different audio bitrates / nr. of channels per quality level, but still have A/V in one stream. This is not ideal for multi-language solutions.
Right. I would say separation of audio and video streams is more important than different audio quality levels. I think it was already pointed out that switching audio streams smoothly requires mixing between the two, which involves downloading some overlap and is not very well supported in many devices (think CE devices...)
We've been working in MPEG on the DASH standard which just reached the "Committee Draft" milestone. Unlike traditional MPEG work items there are a core group of participants who understand that this needs to be done quickly and without excessive complexity (otherwise we probably wouldn't be interested). It is more complex than m3u8, but it supports a lot more features, not all of which are unnecessary ;-) We expect to see a simple profile defined that cuts out the more esoteric stuff.
I wondered what the opinion of the group here was on that work ?
I must say the full spec is rather extensive and overwhelming. I wonder which part of it will get implemented by a wide range of clients. It reminded me a bit of SMIL, which can also do a lot, but is limitedly used in practice.
A "simple" profile, e.g. based upon the example you attached, makes a lot of sense. That one is readeable and fairly easy to use/implement.
Right, that is exactly what I hope to achieve. Bear in mind also that the full spec still requires some editing - even with all the features it could be better written. Also bear in mind that things can still change in this spec during the comment period.
Does DASH also allow full listings of media fragments (for e.g. HTTP Live / Smooth Streaming style chunk requests), or are range requests the only way to get chunks?
You can also list the segments explicitly in the manifest. Or you can provide a "URL template" if the chunks are consistently named (e.g. segment<n>.seg) and all about the same length.
You can, if you wish, use a URL template or segment lists to point at 'medium sized' segments which themselves contain indexes. For example the segments could each be 3 minutes long.
It's a little crazy, but the client procedures to support all these things at once are actually not so bad. There is reference software in MPEG for all this. I am trying to get them to open that up.
Also (this might be another discussion), are there thoughts on standardizing the subtitling format?
Not in the MPEG DASH work, but it would be a good idea. I put TTML/DFXP in the example because that is what we use at Netflix. I'm not sure what the venue would be for standardizing that.
foms mailing list
foms at lists.annodex.net<mailto:foms at lists.annodex.net>
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the foms