[foms] WebM Manifest

Thu Mar 24 15:01:34 PDT 2011

This was a fun thread to read after stepping away for a week. Let me try to
summarize some of the bigger points. If I get any wrong please correct.

Summary:
1. Philip would like to start with API first and then possible work our way
up to a manifest file. One possible way would be to use XMLHttpRequest with
Blob data and byte ranges.

2. Jeroen and Mark think starting from only an api will be fairly hard and
time consuming. I going to agree with Mark and Jeroen here. You would
basically need to add close to a full file parser for each format. And maybe
even tailor some of the API to deal with format differences.

3. Jean-Baptiste was wondering about devices that do not have javascript or
maybe under powered. I think everyone is okay with ending up with a manifest
file that will work with browsers and devices. Just the question is how we
get there?

4. Seems like some people are worried that Dash is too complex.  Mark gave
some info on the simple VOD profile of Dash.

Questions:
I think the original question still remanis, but let me modify it slightly.

1. If W3C adopts the DASH as a manifest format would the W3c license
be acceptable to you?

2. Technically is the simple VOD profile of Dash too complex?

Next steps:
- Create a wiki page to start defining what the API. We can probably have
two pages. Statistics api and control api.

- Create a wiki page of a manifest format.

I think both can be done in parallel. After we get farther along we should
probably move the discussion to another list.

Also I have a quick update, we are implementing a prototype of WebM adaptive
streaming in Chromium. The prototype will use a manifest as a place holder
for now because we think this will be the fastest way to develop and test
the architecture needed to playback adaptive streaming presentations.

Frank

On Mon, Mar 21, 2011 at 12:36 PM, Mark Watson <watsonm at netflix.com> wrote:

>
> On Mar 19, 2011, at 12:13 PM, Steve Lhomme wrote:
>
> > On Sat, Mar 19, 2011 at 6:19 PM, Mark Watson <watsonm at netflix.com>
> wrote:
> >> I'm not sure why you concluded that #2 was not an issue because the
> frames arrive in decode order. I did not mention anything about the order of
> frame arrival. The issue is duplicate decoding of a frame, which is an issue
> both from a decoder capability and computational load point of view.
> >
> > Yes, decoding frame n and not displaying it is a waste of the
> > resource. But it would have happened whether the switch is happening
> > to an aligned or non-aligned fragment. By the time n-1 is displayed,
> > the original n frame has been loaded and decoded anyway.
>
> No, if you have aligned fragments and keyframe positions this cannot
> happen: frame n is a keyframe in both the old and new fragments and frame
> n-1 in the old stream does not depend on n or anything later. Frame n in the
> old stream is in a different fragment from frame n-1. I can feed a fragment
> from the old stream followed by the next fragment from the new stream into
> the media pipeline (before parsing the fragments) and the pipeline does not
> have any overlap to deal with.
>
> Without alignment, the problem is that the media pipeline must deal with
> discarding the overlapping data - and most don't do this today. In the
> simplest case, it just needs to discard overlapping frames before decoding.
> But if there are interframe dependencies it needs to discard some frames
> after decoding.
>
> On some devices with hardware decoders the timing requirements for decoding
> are very precisely engineered and if a given frame needs to be decoded twice
> it can throw off this timing. Essentially you need a larger buffer between
> decoder and renderer to absorb this additional decoding jitter.
>
> >
> >>> Now that's a good point in favor of using the TCP window more. It
> >>> could be reduced while the decision is being made or start loading
> >>> from another stream while the main one is still loading. With a window
> >>> of (almost) 0 the TCP connection would then be established/ready to be
> >>> used as soon as the bandwidth is available. When you know you are
> >>> going to switch to a new stream, you can reduce the window gradually,
> >>> with 0 happening at the exact byte end position of the fragment (or
> >>> n-1 frame). That would minimize the bandwidth waste and latency time
> >>> between reading 2 fragments.
> >>
> >> Closing the receive window just pauses the transmission. The data you
> originally requested will still come later unless you close the connection.
> >
> > Yes. The idea is to reduce the window gradually to 0 so the server
> > stops sending more data before we tell it to close the connection (in
> > fact both information could be sent in the same packet, but I'm not
> > sure the way sockets work usually does that).
>
> Ok, but there is a trade off between receiving the additional overlapping
> data and having to re-open a new connection.
>
> Plus this cross-layer coordination of the window would be pretty hard in
> practice.
>
> >
> >> Receiving the overlap data is not really the issue (though it would be
> nice to avoid). The point is that you cannot detect where to stop in the old
> stream without parsing down to the frame level. Which ties together the
> media player and the adaptive streamer in a way which is both unnecessary
> and not aligned with existing architectures.
> >
> > By existing architecture you mean player architecture or existing
> > adaptive streaming systems ?
>
> Both.
>
> >
> > The idea of allowing non-aligned variants is not unnecessary IMO. I
> > think it should be carefully thought before ruling it out. So far I
> > see drawbacks that balance the advantages but no deal breaker. But
> > that's my opinion and I hope we can all reach a consensus on this.
>
> This has been quite carefully thought through in the DASH work over the
> last couple of years. DASH provides indicators to signal when the streams
> have certain alignment properties and the definitions of these indicators
> has been done very carefully to capture the important properties in a
> media-type/format independent way.
>
> The basic on-demand profile requires "fragment alignment" for the reasons
> I've given, but other profiles could be defined which allow non-aligned
> variants.
>
> >
> >>> That's true and using a range request is surely nicer than playing
> >>> with the TCP window. But as shown above, playing with the TCP window
> >>> can still be useful when switching streams. (server + DNS + TCP
> >>> latency). Also using a "range" request has some drawbacks. It forces
> >>> to open a new connection for each fragment, even if the new fragment
> >>> was exactly the following of the previous fragment.
> >>
> >> No, you re-use the same connection for the next request.
> >
> > I did not know that. Is it supported by all HTTP servers (handling
> > many requests on the same TCP connection) ? In that case stopping a
> > range-request could be possible too in the middle of the stream,
> > helping the issue of switching to another variant in the middle.
>
> HTTP Keep-alive is widely supported and use by web browsers. You can wait
> for one request to finish before issuing the next one, or you can issue
> multiple requests at a time. The latter is called pipelining and is widely
> believed to run into problems with certain transparent proxies. It's
> disabled by default in most browsers, though not all. There's no way to
> cancel a request in HTTP except by closing the connection. It would be
> useful and could easily be added in a backwards-compatible way. Or you can
> use a new protocol like Google's SPDY.
>
> >
> >>> That result in
> >>> time and resource wasted to establish the TCP connection and the
> >>> server side "session". If you use an "offset" request (a start offset
> >>> but no end) you avoid that issue. And you just need to adjust your TCP
> >>> window if you really don't want to waste a byte in the transmission.
> >>>
> >>> It has already been established that having fragments of one stream in
> >>> many file is not practical and will likely not be used (for on demand
> >>> at least). So maybe the next step should be to NOT use range requests
> >>> at all.
> >>
> >> What would you use then?
> >
> > A request specifying the offset to start reading from in the remote
> > stream, without specifying the end offset. I don't know if this is
> > also called a range request or not.
>
> Yes, it's an open-ended range request. You can certainly use those. I don't
> see where it would impact any specification though: purely an implementation
> matter.
>
> >
> > --
> > Steve Lhomme
> > Matroska association Chairman
> >
>
> _______________________________________________
> foms mailing list
> foms at lists.annodex.net
> http://lists.annodex.net/cgi-bin/mailman/listinfo/foms
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.annodex.net/cgi-bin/mailman/private/foms/attachments/20110324/3f8d9a22/attachment.htm