[foms] Adaptive streaming

Wed Oct 27 08:51:31 PDT 2010

On Oct 27, 2010, at 3:51 AM, Jeroen Wijering wrote:

> Hey Mark, all,
> 
> On Oct 26, 2010, at 8:24 PM, Mark Watson wrote:
> 
>>>> The alternative is use of HTTP Range requests, with which we've had no problems in terms of support in devices, servers and CDNs. Store the movie as a single file accompanied with a compact index which enables clients to form range requests for arbitrary sized pieces, down to a single GoP. This also has the advantage that client requests do not need to always be the same size (in time).
>>> 
>>> Yes, either a range-request based or slicing-webserver-module based solution is preferred over actual chunks for larger libraries. 
>>> 
>>> The nice thing about also supporting separate chunks is that the barrier for getting started with adaptive HTTP is very low. One only needs an encoder o
>>> r a segmenter tool; everything else is basic components. We see this now with Apple HTTP - it's the format that's getting traction where Flash/Silverlight are not. Especially on the live side.
>> 
>> MW> For live, you do need small files. For on-demand it's a pain, and a compact time/byte index is a really simple thing to create and put into the file (mp4box already does it for DASH-style indexes). Actually I think the majority of adaptive streaming on the Internet today (by traffic volume) is done this way ;-)
> 
> Do files with a compacted index still play on devices that do not expect it? Think current mobile phones.

In DASH its just a fragmented MP4 file with an extra box ("Segment Index") near the beginning. Existing players would ignore the new box.

> 
> Is compacting an index "cheap" in terms of processing? E.g. is it trivial to run a script over a content library of existing MP4 files to get the compacting done, just like there are scripts to place the MOOV atom at the head? 

The file is still "fragmented" into Movie Fragment/Movie Data pairs. You need a tool to do that. Getting that tool to generate the index as well is simple (they did it in mp4box in less than an afternoon, apparently).

> 
> Last, I wonder how the index of WebM / Ogg compares to MP4? I just checked one of our test-MP4 files. It's 10 minutes (keyframes every 2-5sec)  and has 350k headers.

I guess that would be the MOOV box. That's why you do the fragmentation, so the information from the MOOV gets distributed into MOOFs through the file.

> At 4 quality levels you're looking at 1.4 MB data to prefetch.

The new compact index is 12 bytes per fragment. So 3600 bytes for 10 minutes with 2 second fragments.

The MOOV indexes individual frames, with all the timing information.The new Segment Index indexes whole Movie Fragments, with the most basic time/byte range information. So it is much smaller.

> If WebM / Ogg by its nature already has much smaller headers, one of the big drawbacks of range-request streaming would not be valid for these containers. 

I need to read up on the WebM format (bit of a confession, being on this list, sorry!). But I understood you have the framing information with the samples and therefore smaller headers and probably no need for any kind of formal fragmentation. So your "fragment" would just be a notional concept of a group of samples spanning some time period. Your index would provide time and byte offsets for these fragments. If you already have something which provides time and byte offsets for Random Access Points (for seeking) you could probably re-use that. (Re-using Movie Fragment Random Access box in mp4 was discussed instead of creating the new Segment Index. Segment Index was chosen for some slightly obscure technical reasons).

> 
> 
>>>> As the proposal says, server side logic could translate client "chunk" requests into byte ranges, but to be efficient this process needs to be understood by caches as well as origin servers: CDN caches can (and do) prefetch the "next" part of a file following a range request, which they won't do if they just see individual chunks. It's good if the solution can work with existing HTTP infrastructure.
>>> 
>>> I don't understand this part. Is it good or bad the "next" part is fetched? 
>> 
>> MW> It's good - it means that next part is ready on the local cache when the client requests it (which it frequently does). Delays for fetching from the origin server can be significant and can mean the difference between stalling or not (we've actually seen this on a large scale when using the "separate chunk files" approach).
> 
> Aha, that is something to be aware of indeed. Very interesting point.
> 
> 
>>>> This approach also keeps the manifest compact: if the manifest has to list a separate URL for every GoP it can get quite large with a 2h piece of content. Even after gziping, the size is sufficient to affect startup time (any system being designed now should be targeting ~1s startup, IMO).
>>> 
>>> The issue is there nontheless, right? Either the index is in the manifest or it is in the movie. In both situations, they have to be fully loaded in order to provide random access. Or is there another way?
>> 
>> One difference is that the manifests are generally some text format which even after gzip is much larger than a compactly encoded index. For example for DASH there are 12 bytes per chunk, 4 for the chunk size, 4 for the chunk duration and 4 giving the offset into the chunk (in time) of the first Random Access Point (which IMHO should always be zero ;-). This also zips quite well if the chunks are all the same duration.
> 
> Cool; this ties into my question above around index sizes. 
> 
>> The DASH indexes do support a "heirarchical" structure where you first download an "index of indexes" - for example the first index might have one entry for each 3 minute period of movie and point to byte ranges for the detailed indexes at the 2s level. Both the top level index and a 3 minute index probable fit into one packet. So two RTTs gets you started anywhere in the movie.
> 
> It gets a little complex, but this does combine the best of both.

Right. A single index is definitely sufficient for a first iteration, even if it gets a bit big for a 2h movie. The hierarchy is a pure optimization.

> 
> 
>>> In a situation where one uses range requests, a range-req needs to be done on all quality levels to grab the full index, in order to find the seekpoints, correct?
>> 
>> It's a good idea to get the index for all quality levels, but you don't have to do it all up-front before you start to play. You could have a background process fetching the indices for the next few minutes of the movie so you're always prepared.
> 
> How does Netflix actually start up (i'm in NL)? Do you always start with the lowest quality (like Silverlight) and then scale up? Or do you predict/cookie/guess an appropriate startup level? 
> 
> Starting low leaves overhead for fetching indices, but the experience of looking at a bunch of blocks the first few seconds is not nice. 

Right now we have different versions of our SDK out in different devices, so it depends which device/firmware version you have. You're exactly right about the trade-off, though. Low quality/fast startup vs high quality/slower startup. Generally you want to use some historical information to choose a good startup rate, rather than always stating at the lowest. If you get it wrong (data arriving too slow) you can always change rate right away. People are more tolerate of long startup for movie content than for "browsing" type activities.

> 
> 
>>>> We've been working in MPEG on the DASH standard which just reached the "Committee Draft" milestone. Unlike traditional MPEG work items there are a core group of participants who understand that this needs to be done quickly and without excessive complexity (otherwise we probably wouldn't be interested). It is more complex than m3u8, but it supports a lot more features, not all of which are unnecessary ;-) We expect to see a simple profile defined that cuts out the more esoteric stuff.
>>>> 
>>>> I wondered what the opinion of the group here was on that work ?
>>> 
>>> I must say the full spec is rather extensive and overwhelming. I wonder which part of it will get implemented by a wide range of clients. It reminded me a bit of SMIL, which can also do a lot, but is limitedly used in practice. 
>>> 
>>> A "simple" profile, e.g. based upon the example you attached, makes a lot of sense. That one is readeable and fairly easy to use/implement. 
>> 
>> Right, that is exactly what I hope to achieve. Bear in mind also that the full spec still requires some editing - even with all the features it could be better written. Also bear in mind that things can still change in this spec during the comment period.
> 
> The easier the better! Its simplicity is what makes M3U8 / HTTP Live so attractive. Every developer can learn it and get started in a few hours.
> 

I will use your comments as evidence in my proposals for simplification ;-)

...Mark

> 
> Kind regards,
> 
> Jeroen
> _______________________________________________
> foms mailing list
> foms at lists.annodex.net
> http://lists.annodex.net/cgi-bin/mailman/listinfo/foms
>