[foms] Adaptive streaming

Wed Oct 27 12:57:28 PDT 2010

On Oct 26, 2010, at 1:25 AM, Jeroen Wijering wrote:

>The nice thing about also supporting separate chunks is that
the barrier for getting started with adaptive HTTP is very low. One only needs
an encoder or a segmenter tool; everything else is basic components. We see
this now with Apple HTTP - it's the format that's getting traction where
Flash/Silverlight are not. Especially on the live side.

I think that Apple HTTP Live Streaming (HLS) is getting traction
because it is the only way to stream video to the IOS family of products, which
are quite popular.  It may be easy to implement a mediocre solution, but it is very difficult to have a great 'living-room' quality of experience.  Any good streaming model should scale from HH to CE (HD).

> the barrier for getting started with adaptive HTTP is very
low. One only needs an encoder or a segmenter tool 

The barrier is low for encoding/formatting, but high for clients.  Encoding/formatting complexity is easily
addressed, and a robust set of formatting tools might take 2 months to
produce.  The simplicity of HLS encoding
formatting adds unnecessary complexity to the clients (i.e. splicing video,
mixing audio, etc.).  I have not seen an HLS implementation that can handle a seamless switch between 2 different audio profiles.  IMHO, it only works on paper.

Also, I argue that the HLS barrier on the encode/format side is
not as low as it might seem.  To get true
seamless switching when using muxed A/V streams requires precise audio
alignment across all bitrates, and this is
not a simple muxing task.  

> Yes, that is something we haven't talked about a lot. At the workshop, the idea was to allow for different audio bitrates / nr. of channels  per quality level, but still have A/V in one stream. This is not ideal for multi-language solutions.

It is not only troublesome for multi-language, but also for audio quality. 
For mobile, we find 32k or 64k audio is usually sufficient.  For CE (living room), 96k is the minimum.  Seamless audio switching is difficult, and user
tolerance for audio glitches is low.  Therefore,
to properly support adaptive streaming with muxed A/V and provide the best
quality audio (for any given use case), every variation of audio requires
another copy of the video.  So if you
have 4 audio bitrates (32, 64, 96, & 192 5.1) and two alternate tracks, you
have 5 extra copies of the video stream. 
If you have a catalog of 40,000 titles, encoded at 8 video bitrates, you
have 1.6M unnecessary video files on your CDN (that average ~1GB per hour). 
This is with single file model, chunked is several orders of magnitude higher. 
No CDN can manage these numbers, and so we simply gave up and only stream
64k audio to both HH and CE.
If A/V streams are un-muxed, the client will setup the audio decode pipeline once, and can freely to switch video at will with no concerns about audio.  You can not get any simpler than this, which speaks to my main point:  Complexity that can easily be handled at the encode/format step should never be pushed out to the client.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.annodex.net/cgi-bin/mailman/private/foms/attachments/20101027/78037666/attachment.htm