[foms] Chunked/adaptative streaming at Dailymotion

Tue Nov 2 10:49:28 PDT 2010

Hi all,

My name is Pierre-Yves Kerembellec and I'm responsible for the streaming architecture at Dailymotion.
Sylvia and I exchanged some emails on chunked streaming implementation, and she suggested I share this
with the FOMS list (thanks Sylvia for signing me up, the discussions here are very enlightening!).

_Where we came from_

For the past 4 years, we've been relying on a home-grown "streaming server" (basically a module build on
top of Apache), with seeking and bandwidth throttling support for FLV, MP4 and OGG containers. The key
feature has been the capability to deliver content as successive bursts of A/V data within a single HTTP
connection ("burst.com-ish", a technique also used by YT/Google). This has really been instrumental in
lowering our peek egress bandwidth level (because most users wouldn't watch videos to their full extend
and the remaining bits wouldn't be sent for nothing).

The only problem with this approach is the low cache-ability of content in the internet infrastructure
(ISP proxy-caches, browser cache, etc.), because each connection (and associated content) would be unique
to the end-user: a typical content URL would contain some (changing) security token, and part of the
content itself - like MP4 of FLV headers - would be rewritten on the fly, depending on the client request.

This tends to become problematic because of the amount of requests we get on popular (hot) content (these
requests are seen as unique and independent and won't benefit from any cache effect in the network), when
as the same time dealing with the long-tail (cold) content. This is costing us more than it should,
especially when thousands of users from the same ISP are watching the exact same content, and we know that
ISP has proxy-caches (or even internal CDN) in place we could benefit from.

_What we are aiming at_

We decided a year ago to investigate _real_ chunked delivery, i.e. separate small physical files downloaded
(and paced) by the media players. We looked at different existing implementations (Microsoft Smooth Streaming,
Adobe HTTP Dynamic Streaming, Apple HTTP Live Streaming, ...) to see how these vendors did it, and finally
decided to rollout a new iteration of our own system.

This new version is a standalone event-driven HTTP server, with support for standard HTTP requests (regular
and byte-ranges), Flash player (json + FLV remuxing), Apple HTTP Live Streaming (m3u8 + MPEG2-TS re-muxing).
It supports the original containers from the previous version (MP4 and FLV), and we will probably support
MKV (+ VP8/Vorbis) if the WebM initiative gains enough momentum.

It supports the following clients:

+ "GET /content.mp4 HTTP/1.x" clients, such as IPTV STB, or Flash players prior to 10.1 (where access to
   the A/V pipeline is not available, thus defeating applicative-level A/V segments injection).

+ "let's-probe-the-moov-atom-using-byte-ranges" players, such as today's browsers HTML5 video tags and
   most of the smart-phones video players implementations.

+ Flash players >= 10.1 (in chunked delivery mode, MP4 re-muxed into FLV for fast startup, see below).

+ "iDevices" in HTTP Live Streaming mode (i.e. m3u8 + MPEG-TS payloads, also working for desktop Safari).

and solves recurrent problems with misconfigured or strict proxies/firewalls in enterprises environment.

The main objective is to continue cutting bandwidth costs and enhancing user's experience with video
delivery, while at the same time paving the road for live streaming and P2P distribution.

_Physical vs virtual chunks_

With 20M+ videos (each transcoded into 4 to 6 different formats, most of them being MP4/H264/AAC-based),
the question of re-encoding (or simply remuxing to create fragmented formats) didn't even arise: it would
just be a tremendous task to move those peta-bytes around, not to mention splitting them into billions of
small files! We also wanted to keep the original videos untouched so that simple clients (such as IPTV STB
or HTML5 video tags) could still play them properly.

This is why the origin streaming server is taking care of re-muxing original video files on the fly (it's
mostly H264/AAC samples from MP4 container remuxed into FLV/MPEG2-TS containers).

_Manifest files vs URL templating_

When we looked at the Smooth Streaming implementation, we discovered Microsoft was not using a playlist of
available fragments, but an URL templating system instead (where each fragment would be time-referenced,
a little bit like the start= and end= parameters in current implementations, but with a 100ns accuracy!).

While appealing, we preferred to go with the traditional "playlist" approach, where each fragment is
actually referenced with an URL in a manifest file. Fragments playlists for long video footages may not be
as large as one may think because you may choose to include several closed-GOPs inside a single fragment:
even with a 2-secs keyframe interval, one may choose to generate 10-secs fragments (i.e. having 5 keyframes
inside a single fragment is not necessarily a problem, as long as a fragment starts with a keyframe and is
independently playable).

Let's say we have a fragment every 10 seconds, a 2 hours movie would translate into about 720 URLs in the
playlist. Once gzip-ed by the HTTP transport layer, this would not be more than 10-20KB downloaded by the
client.

_Manifests formats_

We consider that there are 2 levels of manifest files in the system:

+ one for tying different video formats together to form an ABR group.

+ one for describing the fragments inside a single video format (as introduced above).

Generating the first one is much more of an applicative matter, because most of the time the different
formats (multiple bitrates/framesize versions of the same content for instance) are stored in separate
files, and the storage sub-system may not know that they relate to the same content. This is why it is
generated by our web servers. The second one is generally tied to the keyframes layout inside a particular
format, and is generated by the streaming server directly.

We choose to base both manifests on JSON, because it's easily parsable and virtually all platforms and
languages already have JSON parsers (it's a native format in Javascript (browsers) and Actionscript (Flash),
which are both ECMAScript derivatives). It's also easily extensible and not tied to any existing format
(like Apple's M3U8, Microsoft/Adobe manifests files, DASH), so there no fear of infringing some vendor IP.

I can provide more details on this format if someone is interested.

_A/V pipeline access_

Chunked streaming (and feeding) was made possible because Adobe and Microsoft gave access to the A/V pipeline
and to primitives to "feed" samples to the A/V decoders. In Flash, this is done by the appendBytes() call on
a NetStream object (see [1]), and some actions may be performed to reset the decoders when a new fragment is
being injected (see [2]).

This is why I second Jeroen's proposal: A/V pipeline access will be instrumental in experimenting bandwidth
measurement and bit-rates switching techniques, because the fragments would be fetched using WebSockets (or
any other relevant URLLoader/URLStream equivalent in HTML5, see [3]) and passed to the decoders from Javascript,
allowing full control on buffer management and switching algorithms (compared to a vendor-specific implementation
that would probably not behave the same way across browsers).

_A/V pipeline bitstream format_

We choose to re-encapsulate/re-mux the A/V samples into the clients relevant "containers" at the server side.
For Flash, it's FLV because the appendBytes() primitive expect this format, and it's really efficient as far
container overhead is concerned (we may have send re-muxed MP4 fragments, but it's more complicated to synthesize
on-the-fly, and you also need an MP4->FLV demuxing/remuxing library in the player itself, which is a little
bit overkill IMHO (it's implemented in OSMF for instance)).
For Apple iDevices, it's MPEG2-TS. Apple chose this because it's a de-facto re-synchronizable stream of samples,
but its overhead is way higher than FLV (it was designed with ATM cells size in mind, go figure ^_^).

As long as the bistream format is a simple "stream of samples" (with just enough time-coded information and
decoders initialization to play those samples correctly and independently inside a particular fragment), I
think it doesn't really matter which one is chosen. I don't know enough about MKV to determine if it's a good
candidate or not. MP4 (even in its fragmented MOOF/TRAF version) is probably NOT, because of the way the samples
are referenced inside a global index: it doesn't serve any purpose in the streaming case.

_Conclusion_

As an end note, I'd really like to open-source this new server code if it's ok with my employer: there are some
parts of the code that are Dailymotion-specific, such as origin/edge exchange protocol, progressive caching techniques,
etc., but otherwise it may be a nice chunked/ABR streaming server implementation to play with in browsers.

My 2 cents,
Pierre-Yves

[1] http://help.adobe.com/en_US/FlashPlatform/beta/reference/actionscript/3/flash/net/NetStream.html#appendBytes()
[2] http://help.adobe.com/en_US/FlashPlatform/beta/reference/actionscript/3/flash/net/NetStreamAppendBytesAction.html#RESET_BEGIN
[3] http://help.adobe.com/en_US/FlashPlatform/beta/reference/actionscript/3/flash/net/URLStream.html

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.annodex.net/cgi-bin/mailman/private/foms/attachments/20101102/11ee8456/attachment-0001.htm