[foms] Adaptive streaming
jeroen at longtailvideo.com
Wed Oct 27 03:51:48 PDT 2010
Hey Mark, all,
On Oct 26, 2010, at 8:24 PM, Mark Watson wrote:
>>> The alternative is use of HTTP Range requests, with which we've had no problems in terms of support in devices, servers and CDNs. Store the movie as a single file accompanied with a compact index which enables clients to form range requests for arbitrary sized pieces, down to a single GoP. This also has the advantage that client requests do not need to always be the same size (in time).
>> Yes, either a range-request based or slicing-webserver-module based solution is preferred over actual chunks for larger libraries.
>> The nice thing about also supporting separate chunks is that the barrier for getting started with adaptive HTTP is very low. One only needs an encoder o
>> r a segmenter tool; everything else is basic components. We see this now with Apple HTTP - it's the format that's getting traction where Flash/Silverlight are not. Especially on the live side.
> MW> For live, you do need small files. For on-demand it's a pain, and a compact time/byte index is a really simple thing to create and put into the file (mp4box already does it for DASH-style indexes). Actually I think the majority of adaptive streaming on the Internet today (by traffic volume) is done this way ;-)
Do files with a compacted index still play on devices that do not expect it? Think current mobile phones.
Is compacting an index "cheap" in terms of processing? E.g. is it trivial to run a script over a content library of existing MP4 files to get the compacting done, just like there are scripts to place the MOOV atom at the head?
Last, I wonder how the index of WebM / Ogg compares to MP4? I just checked one of our test-MP4 files. It's 10 minutes (keyframes every 2-5sec) and has 350k headers. At 4 quality levels you're looking at 1.4 MB data to prefetch. If WebM / Ogg by its nature already has much smaller headers, one of the big drawbacks of range-request streaming would not be valid for these containers.
>>> As the proposal says, server side logic could translate client "chunk" requests into byte ranges, but to be efficient this process needs to be understood by caches as well as origin servers: CDN caches can (and do) prefetch the "next" part of a file following a range request, which they won't do if they just see individual chunks. It's good if the solution can work with existing HTTP infrastructure.
>> I don't understand this part. Is it good or bad the "next" part is fetched?
> MW> It's good - it means that next part is ready on the local cache when the client requests it (which it frequently does). Delays for fetching from the origin server can be significant and can mean the difference between stalling or not (we've actually seen this on a large scale when using the "separate chunk files" approach).
Aha, that is something to be aware of indeed. Very interesting point.
>>> This approach also keeps the manifest compact: if the manifest has to list a separate URL for every GoP it can get quite large with a 2h piece of content. Even after gziping, the size is sufficient to affect startup time (any system being designed now should be targeting ~1s startup, IMO).
>> The issue is there nontheless, right? Either the index is in the manifest or it is in the movie. In both situations, they have to be fully loaded in order to provide random access. Or is there another way?
> One difference is that the manifests are generally some text format which even after gzip is much larger than a compactly encoded index. For example for DASH there are 12 bytes per chunk, 4 for the chunk size, 4 for the chunk duration and 4 giving the offset into the chunk (in time) of the first Random Access Point (which IMHO should always be zero ;-). This also zips quite well if the chunks are all the same duration.
Cool; this ties into my question above around index sizes.
> The DASH indexes do support a "heirarchical" structure where you first download an "index of indexes" - for example the first index might have one entry for each 3 minute period of movie and point to byte ranges for the detailed indexes at the 2s level. Both the top level index and a 3 minute index probable fit into one packet. So two RTTs gets you started anywhere in the movie.
It gets a little complex, but this does combine the best of both.
>> In a situation where one uses range requests, a range-req needs to be done on all quality levels to grab the full index, in order to find the seekpoints, correct?
> It's a good idea to get the index for all quality levels, but you don't have to do it all up-front before you start to play. You could have a background process fetching the indices for the next few minutes of the movie so you're always prepared.
How does Netflix actually start up (i'm in NL)? Do you always start with the lowest quality (like Silverlight) and then scale up? Or do you predict/cookie/guess an appropriate startup level?
Starting low leaves overhead for fetching indices, but the experience of looking at a bunch of blocks the first few seconds is not nice.
>>> We've been working in MPEG on the DASH standard which just reached the "Committee Draft" milestone. Unlike traditional MPEG work items there are a core group of participants who understand that this needs to be done quickly and without excessive complexity (otherwise we probably wouldn't be interested). It is more complex than m3u8, but it supports a lot more features, not all of which are unnecessary ;-) We expect to see a simple profile defined that cuts out the more esoteric stuff.
>>> I wondered what the opinion of the group here was on that work ?
>> I must say the full spec is rather extensive and overwhelming. I wonder which part of it will get implemented by a wide range of clients. It reminded me a bit of SMIL, which can also do a lot, but is limitedly used in practice.
>> A "simple" profile, e.g. based upon the example you attached, makes a lot of sense. That one is readeable and fairly easy to use/implement.
> Right, that is exactly what I hope to achieve. Bear in mind also that the full spec still requires some editing - even with all the features it could be better written. Also bear in mind that things can still change in this spec during the comment period.
The easier the better! Its simplicity is what makes M3U8 / HTTP Live so attractive. Every developer can learn it and get started in a few hours.
More information about the foms