[foms] Proposal: adaptive streaming using open codecs

Tue Nov 16 13:11:28 PST 2010

On Nov 16, 2010, at 10:29 AM, Pierre-Yves KEREMBELLEC wrote:

>>> That might be what some advocate, but what I would advocate is having just one file for each bitrate of video and a separate one for each bitrate or language of audio etc. and then provide the clients with an index into each file so they can make byte range requests for the pieces they need from each.
>>> 
>>> There does exist, in several CDNs, a simple server extension which enables a byte range to be embedded in a URL, instead of in the Range header, and we do use this with Apple clients for our service to avoid the "millions of files" problem. But this is just a different way of communicating the byte range to the server which happened already to exist and be useful as a workaround and which is very much an application-independent capability: what I would suggest we avoid is any video-specific server extensions, where servers are expected to understand the format of the video and audio files, re-multiplex them etc.
>> 
>> So it seems there's a general consensus on splitting up audio and video into separate streams? Who's really against it, for which reasons? This has big implications for the dummy Stream.appendChunk() call we were brainstorming about.
>> Just appending chunks wouldn't work anymore; we'd basically have to create tracks and append chunks to tracks...
>> 
>> I'm also a little lost on how the files on the server would be structured. Would there be audio-only and video-only "plain" WebM files, or do we need to go to a "chained" format (range requests) or a "chunked" format (separate files)? In both
>> latter cases, we'd loose adaptive streaming supporting current WebM files...
> 
> I think both Mark and Frank are totally right about separating the different tracks _before_ sending to the client, in order to minimize the
> different combinations sent on the wire and maximize internet/browser cache efficiency. If you think about it, this pattern has been around
> for years (if not decades) with RTSP, where audio and video are delivered separately within UDP RTP virtual streams (they may also be
> delivered interleaved within the TCP RTSP connection itself to circumvent firewall problems, but this is another story).
> 
> Whether tracks are stored as separate files or extracted from an interleaved (muxed) file using a server-side extension is outside the scope
> of this discussion IMHO, and only pertains to server-side performance.
> 
> That said, I think the best compromise would be for browsers to accept both interleaved and elementary streams: when there's only one
> video track and one audio track (which probably covers 99% of the video content out there), it makes sense not to add complexity for
> publishers and ask them to demux their videos (some probably don't know exactly what video internals are about anyway).
> 
> That's why I would propose the following :
> 
> - either interleaved or elementary streams sent from the server side

This is fine for me: I'm not arguing that interleaved streams should not be supported, just that separate elementary streams should.

> 
> - multiple versions/qualities for the same content advertised using a "main" manifest file (not tied to the HTML markup because we
>  probably want this to work outside browsers)
> 
> - multiple audio/video/<you-name-it> tracks for the same content also advertised in this main manifest file
> 
> - main manifest files may refer to streams directly, or to "playlist" manifest files (in case the publisher willingly choose to use fragments)
> 
> - playlist manifest files list all fragments with precise timestamps and duration (i.e. not "a-la-Apple-M3U8")

Even if you use separate chunks, you don't necessarily have to list them explicitly. There's usually a consistent naming scheme, so a simple template approach can save you having big playlists. In this case you can still put everything into one manifest (no need for a "main" one pointing to "playlist" ones.).

And you don't really need precise timing in the manifest. Approximate timing is sufficient as long as you have precise timing in the files. For example, perhaps all files are 10s long according to the manifest, but the real boundaries are put at the nearest RAP to a multiple of 10s.

> 
> - JSON for all manifest files (as it's easy to parse on any platform and heavily extensible)
> 
> - all interleaved and elementary streams - chunked or not - are independently playable (and start with a RAP for video)

Did you mean that individual chunks are independently playable, or that the concatenation of the chunks should be independently playable ?

In most formats there are file/codec headers that you don't want to repeat in every chunk (seems to be true for mp4 and WebM).

> 
> - the chosen container is easily "streamable", and require minimum headers overhead to pass codec initialization info

I think formally what you want is that the headers contain only information which is about the whole stream (in time). Information which is about some part (in time) of the stream should be distributed at appropriate points in the file.

> 
> - RAP may not be aligned between versions (or at least it shouldn't be a strong requirement even if in practice it's often
>   the case), thus end-user experience with no-glitch stream-switching would depend on renderer and double-decoding/buffering
>   pipeline capabilities

I believe it is very unlikely that many kinds of device will support the kind of double-decoding need to splice at arbitrary points without any alignment between versions.

So, I think it's important that the manifest at least indicate whether the fragmentation has the nice alignment and RAP position properties needed to enable seamless switching without such media pipeline enhancements. Devices without that capability can then choose whether to switch with glitches, or not switch at all. Providers can decide whether they want to prepare content which switches seamlessly on all devices or just on some.

We defined the necessary alignment properties quite precisely in MPEG and included such an indicator.

...Mark

> 
> Thoughts ?
> Pierre-Yves
> 
> _______________________________________________
> foms mailing list
> foms at lists.annodex.net
> http://lists.annodex.net/cgi-bin/mailman/listinfo/foms
>