[foms] WebM Manifest

Mon Mar 21 09:36:08 PDT 2011

On Mar 19, 2011, at 12:13 PM, Steve Lhomme wrote:

> On Sat, Mar 19, 2011 at 6:19 PM, Mark Watson <watsonm at netflix.com> wrote:
>> I'm not sure why you concluded that #2 was not an issue because the frames arrive in decode order. I did not mention anything about the order of frame arrival. The issue is duplicate decoding of a frame, which is an issue both from a decoder capability and computational load point of view.
> 
> Yes, decoding frame n and not displaying it is a waste of the
> resource. But it would have happened whether the switch is happening
> to an aligned or non-aligned fragment. By the time n-1 is displayed,
> the original n frame has been loaded and decoded anyway.

No, if you have aligned fragments and keyframe positions this cannot happen: frame n is a keyframe in both the old and new fragments and frame n-1 in the old stream does not depend on n or anything later. Frame n in the old stream is in a different fragment from frame n-1. I can feed a fragment from the old stream followed by the next fragment from the new stream into the media pipeline (before parsing the fragments) and the pipeline does not have any overlap to deal with.

Without alignment, the problem is that the media pipeline must deal with discarding the overlapping data - and most don't do this today. In the simplest case, it just needs to discard overlapping frames before decoding. But if there are interframe dependencies it needs to discard some frames after decoding.

On some devices with hardware decoders the timing requirements for decoding are very precisely engineered and if a given frame needs to be decoded twice it can throw off this timing. Essentially you need a larger buffer between decoder and renderer to absorb this additional decoding jitter.

> 
>>> Now that's a good point in favor of using the TCP window more. It
>>> could be reduced while the decision is being made or start loading
>>> from another stream while the main one is still loading. With a window
>>> of (almost) 0 the TCP connection would then be established/ready to be
>>> used as soon as the bandwidth is available. When you know you are
>>> going to switch to a new stream, you can reduce the window gradually,
>>> with 0 happening at the exact byte end position of the fragment (or
>>> n-1 frame). That would minimize the bandwidth waste and latency time
>>> between reading 2 fragments.
>> 
>> Closing the receive window just pauses the transmission. The data you originally requested will still come later unless you close the connection.
> 
> Yes. The idea is to reduce the window gradually to 0 so the server
> stops sending more data before we tell it to close the connection (in
> fact both information could be sent in the same packet, but I'm not
> sure the way sockets work usually does that).

Ok, but there is a trade off between receiving the additional overlapping data and having to re-open a new connection.

Plus this cross-layer coordination of the window would be pretty hard in practice.

> 
>> Receiving the overlap data is not really the issue (though it would be nice to avoid). The point is that you cannot detect where to stop in the old stream without parsing down to the frame level. Which ties together the media player and the adaptive streamer in a way which is both unnecessary and not aligned with existing architectures.
> 
> By existing architecture you mean player architecture or existing
> adaptive streaming systems ?

Both.

> 
> The idea of allowing non-aligned variants is not unnecessary IMO. I
> think it should be carefully thought before ruling it out. So far I
> see drawbacks that balance the advantages but no deal breaker. But
> that's my opinion and I hope we can all reach a consensus on this.

This has been quite carefully thought through in the DASH work over the last couple of years. DASH provides indicators to signal when the streams have certain alignment properties and the definitions of these indicators has been done very carefully to capture the important properties in a media-type/format independent way.

The basic on-demand profile requires "fragment alignment" for the reasons I've given, but other profiles could be defined which allow non-aligned variants.

> 
>>> That's true and using a range request is surely nicer than playing
>>> with the TCP window. But as shown above, playing with the TCP window
>>> can still be useful when switching streams. (server + DNS + TCP
>>> latency). Also using a "range" request has some drawbacks. It forces
>>> to open a new connection for each fragment, even if the new fragment
>>> was exactly the following of the previous fragment.
>> 
>> No, you re-use the same connection for the next request.
> 
> I did not know that. Is it supported by all HTTP servers (handling
> many requests on the same TCP connection) ? In that case stopping a
> range-request could be possible too in the middle of the stream,
> helping the issue of switching to another variant in the middle.

HTTP Keep-alive is widely supported and use by web browsers. You can wait for one request to finish before issuing the next one, or you can issue multiple requests at a time. The latter is called pipelining and is widely believed to run into problems with certain transparent proxies. It's disabled by default in most browsers, though not all. There's no way to cancel a request in HTTP except by closing the connection. It would be useful and could easily be added in a backwards-compatible way. Or you can use a new protocol like Google's SPDY.

> 
>>> That result in
>>> time and resource wasted to establish the TCP connection and the
>>> server side "session". If you use an "offset" request (a start offset
>>> but no end) you avoid that issue. And you just need to adjust your TCP
>>> window if you really don't want to waste a byte in the transmission.
>>> 
>>> It has already been established that having fragments of one stream in
>>> many file is not practical and will likely not be used (for on demand
>>> at least). So maybe the next step should be to NOT use range requests
>>> at all.
>> 
>> What would you use then?
> 
> A request specifying the offset to start reading from in the remote
> stream, without specifying the end offset. I don't know if this is
> also called a range request or not.

Yes, it's an open-ended range request. You can certainly use those. I don't see where it would impact any specification though: purely an implementation matter.

> 
> -- 
> Steve Lhomme
> Matroska association Chairman
>