[foms] WebM Manifest

Mark Watson watsonm at netflix.com
Thu Mar 17 08:59:24 PDT 2011

On Mar 17, 2011, at 2:14 AM, Philip Jägenstedt wrote:

On Wed, 16 Mar 2011 17:03:13 +0100, Mark Watson <watsonm at netflix.com<mailto:watsonm at netflix.com>>

Hi Philip,

A couple of comments below...

On Mar 15, 2011, at 2:01 AM, Philip Jägenstedt wrote:

On Mon, 14 Mar 2011 22:56:23 +0100, Frank Galligan
<fgalligan at google.com<mailto:fgalligan at google.com>>

If the Web and TV IG choose the DASH manifest as the baseline manifest
format for adaptive streaming, would all of you be OK with implementing
in your products?

In short, no.

I've previously glanced over the DASH spec and have done so again today,
and I am not very enthusiastic about it at all.

Since the spec is written for a non-browser context, it fails to make
use of existing browser infrastructure. Everything is done
whereas in a browser context one can leave many things to be dealt with
using scripts. I think we should aim for a solution that doesn't
require fetching a manifest file over HTTP repeatedly, we could just as
well build a solution using WebSockets, just to name one possibility.

This was deliberate as it was recognized that HTML/Javascript
environments were not the only ones where adaptive streaming would be
needed. We wanted to have a solution which was independent of the
presentation framework and I think it would be valuable for the industry
to have such a single solution, rather the multiple solutions for
different environments.

Would it be an acceptable outcome if one can deliver to browsers using a
DASH manifest with some JavaScript glue to parse the manifest and map that
to the lower level APIs that browsers provide?

The big question is the design of this API - see below.

DASH does not require repeatedly fetching a manifest. The scenarios
where repeated fetching is necessary are some specific live scenarios,
but both live and on-demand can be implemented with a single manifest

OK, could you explain a bit which live scenarios require refetching and
which don't and how that works?

Actually, I'm not sure if I believe those frequent refetching scenarios are valid at all.

But anyway, others have suggested that there is value in hiding the URLs for future segments until those segments are actually available. The stated concern was that if you reveal the URLs in advance, people could request them (either accidentally or on purpose) and cause unwanted server load. But people can request non-existent resources whether you tell them the URLs or not. URL hiding means you need repeated manifest updates to get the latest URLs.

The obvious alternative is to construct the URLs according to some consistent scheme and advertise a template in the original manifest (DASH supports this, as does MS Smooth Streaming). In DASH the template looks something like http://example.com/content/1234/segment$Index$.mp4.

Another reason given is that you might want to change the advertised URLs for failover reasons. These failover scenarios have not been described in detail in any discussions I've been involved in, though. One way to address this is to provide multiple alternative URLs for each resource. DASH does this by allowing you to specify multiple base URLs and then the resource URLs are relative to those. This is what we do today pretty successfully. Another way would be to require clients to update the manifest when they see advertised URLs returning 404s.

Finally, there are scenarios where a live presentation needs to change at short notice. Ad insertion into live sporting events (where the timing of the ads is known only at short notice) has been given as an example. Here, I think that it will be more common for the ads to be spliced into the live stream, rather than updating the manifest with new URLs etc. But the manifest update would allow ad personalization.

This doesn't happen very often, though. Requiring all clients to re-request the manifest every 10s seems overkill for something that will happen rarely and only on a certain kinds of live session. A better solution would be some kind of in-band indication that a manifest update is required.

My position is that we should start from the bottom, implementing APIs
browsers that make it possible to implement adaptive streaming using
scripts. If we succeed at that, then DASH could be implemented as a
JavaScript library. When we have that low-level API in place I think we
should look at simplifying it for authors by looking at a manifest
but I truly doubt taking DASH wholesale would be the best long-term
solution for either browser implementors or web authors.

When you say "low level API" do you mean the ability to provide
information to the player about the various available streams ? Or do
you mean even lower, where the API allows you to provide raw media data,
or URLs for media chunks, with all the rate adaptation etc. implemented
in the Javascript layer ?

The lowest level that I think we should provide for is fetching multiple
resource (chunks) using XMLHttpRequest and concatenating these into a
Stream object to which one can continuously append new chunks. Certainly
one could allow completely script-generated data to be spliced in between
chunks, if there's any utility in this. Bitrate switching logic would have
to implemented in JavaScript based on current playback position, download
speed, etc.

The problem is that adaptive streaming is more complex than simply concatenating a bunch of resources found at URLs advertised in a manifest.

Firstly, you need reasonably small granularity in terms of switch points. 2s is good. 10s is too long.

Next if you had a separate file for each 2s chunk, then you have an unmanageably large number of files (it would be ~25 billion for our content library).

The solution in DASH (required in the "Basic On Demand" profile) is to store the content as a single file for each bitrate. At the start of a file is an index giving the time-to-byte-range mapping for the 2s fragments (2s is an example - the spec doesn't constrain you). This is for on-demand, not live, btw. The index is in the file, binary coded, to keep it compact and thereby keep startup time low. If it was in the XML Manifest it would be huge.

To construct byte range requests you need to read and parse this index. I'm not sure Javascript has good tools for efficiently handling & parsing binary data yet.

Secondly, the information needed for rate adaptation logic is quite complex. Especially if you want to allow space for experimentation with different algorithms.

One important area for experimentation is in how to measure throughput and how to translate these measurements into predictions of future throughput. The input to that piece is a set of ( timestamp, byte count ) pairs, each entry corresponding to a single socket read. As soon as you "summarize" this information in some specific way you're constraining the space for experimentation, compared to what we have in native code.

Then you need information about the future choices you could make at each switch point. This isn't just a set of bitrate figures, because streams are usually VBR. You need to know the VBR profile of each stream in order to make a good prediction of whether it could be played given your predicted future throughput. You also need to know the current buffer state to make that prediction.

Finally, though less significant, buffering space is often limited and so the prediction of whether a given stream can be played without stalling needs to consider that, for which you need to know how quickly buffer space is going to be freed, which again is not a constant-bit-rate thing.

And then you need to do the last two independently for each track (audio, video, etc.).

I think a low-level API which enabled research and experimentation with rate adaptation algorithms would be a great thing: there is still some way to go on the rate adaptation problem. But the design of such an API is itself a big research topic: I think the above is too low-level, but if you make it too simple you don't have any space to experiment and can't even get parity with existing common practice.

When we have this done, I would suggest adding another layer to take away
the complexity of the switching logic. Something like an event that is
fired when it looks like we will run out of data might be enough.

See above. I think it definitely would not be enough.

One can of course add more and more layers on top of this, up to and
including a manifest format that takes care of everything.

I think the need for adaptive streaming in HTML5 is sufficient that the manifest approach should come first and we should work on exposing the rate adaptation logic to Javascript in parallel. The manifest design is not simple either, but has been extensively studied in several proprietary systems and those learnings fed into DASH over the last couple of years.

You could imagine an API on which the manifest parsing is done in Javascript and then the information in the manifest is passed to the video tag programmatically. This would decouple that API from the specific DASH manifest format. I'm not really sure what the value of that would be: you are just converting information from one encoding to another, equivalent, one.

The latter was discussed before on this list. It is very attractive from
the point of view of enabling experimentation with rate adaptation
algorithms, but my conclusion after the discussion was that practically
it would be difficult to come up with an API that was rich enough to
enable meaningful experimentation. A simple "switch up/down" method call
is not sufficient to implement a working adaptation algorithm: you need
to know about switch points, byte ranges within streams, stream VBR
profiles and a more detailed view of past throughput than just an
instantaneous or fixed window throughput measure.

We'll have to provide the buffering and playback statistics that are
needed to make it work. Browsers would have to have this information
internally to be able to implement adaptive streaming "natively", so it's
just a question of exposing it to JavaScript.

An alternative approach suggested at the Web&TV working group was that
players might support a number of different algorithms of varying
maturity, much as OS kernels support a variety of TCP congestion control
algorithms. There would need to be an API to discover and configure the
various algorithms.

Yeah, that's not a bad idea, and is not in conflict with providing the
tools necessary to implement other algorithms completely in JavaScript.


I think that those browser vendors that are interested in a streaming
solution using open formats get together and start to discuss a
solution. This list or the WHATWG would be sensible venues for that,

When you say "browser vendors", who do you mean ? If you mean
Opera/Mozilla/Google/Apple/Microsoft then I think the stakeholders for
this topic include a much wider range of companies. Web technologies are
finding application in many new environments where adaptive streaming is
important (TVs and TV-connected devices being the most interesting for
my company). The W3C recently set up the Web & TV Interest group and
that might also be a good venue to get involvement from more

Yes, I mean those web browser vendors. I've already joined the Web & TV IG
and am following that discussion too. The work needs to get done, I'm not
fussed about the venue.

Philip Jägenstedt
Core Developer
Opera Software
foms mailing list
foms at lists.annodex.net<mailto:foms at lists.annodex.net>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.annodex.net/cgi-bin/mailman/private/foms/attachments/20110317/e204a7c7/attachment-0001.htm 

More information about the foms mailing list