[foms] Chunked/adaptative streaming at Dailymotion

Pierre-Yves KEREMBELLEC pierre-yves.kerembellec at dailymotion.com
Tue Nov 9 05:54:08 PST 2010


This new version is a standalone event-driven HTTP server, with support for standard HTTP requests (regular
and byte-ranges), Flash player (json + FLV remuxing), Apple HTTP Live Streaming (m3u8 + MPEG2-TS re-muxing).
It supports the original containers from the previous version (MP4 and FLV), and we will probably support
MKV (+ VP8/Vorbis) if the WebM initiative gains enough momentum.
I know some server modules (Code-shop, Adobe) need their MP4 files preprocessed into fragmented MP4 before playout can be done. You seem to simply use the MP4 files and do all on the fly?

Yes, because we had this huge amount of content (15M * 4 formats = 60M files) that we didn't want to remux. We choose
to build a "dynamic" server for this reason. I'm not saying it's the best choice out there, but it fitted our needs best.

Is that fast enough, since I presume the server has to work pretty hard to extract a fragment from an MP4 file?

Not exactly. We build an index the first time the file is fetched from the storage servers and cached at the first layer
of streaming servers. The file is not even fetched entirely: instead, it's been cached progressively (using sparse files),
because we only need a small portion of that file to build the index (namely the MOOV atom for MP4, whether it's at
the beginning or at the end of the file, and that's also why the communication between the storage servers and the
streaming servers layer is using HTTP bytes ranges (the storage servers are just "dump" HTTP 1.1 servers)).

Building the index takes less than 100ms on a busy server for a 1h-long content, and re-reading this index to build
manifests or fragments on the fly is negligible. Most CPU is spent in disk and network I/O, reading file blocks and
interleaving to deliver the final stream (we are using specific syscalls like splice() and vectored writes to optimize
this part).

We choose to base both manifests on JSON, because it's easily parsable and virtually all platforms and
languages already have JSON parsers (it's a native format in Javascript (browsers) and Actionscript (Flash),
which are both ECMAScript derivatives). It's also easily extensible and not tied to any existing format
(like Apple's M3U8, Microsoft/Adobe manifests files, DASH), so there no fear of infringing some vendor IP.
Awesome! Do you have an example of both formats - overarching and single bitrate?

Sure, but nothing really rocket-science here, it just mimics Apple's M3U8 behavior with some small variations
(like precise timecode instead of second-rounded ones or extensible format).

An example of a variant manifest fetched from http://server1/123/456/123456.mp4.manifest (tying formats all-together,
all durations are in ms, live and security entries removed for brevity):

{
    "revision": "1.1",
    "base": "http://server2/123/456/123456",
    "versions":
    [
        {
            "title": "Low quality",
            "duration": "463240",
            "bitrate": "260",
            "videocodec": "H264 at 1.0",
            "framesize": "320x240",
            "audiocodec": "AAC at LC",
            "audiolang": "en",
            "default": "no",
            "src": "_mp4_h264_aac_ld.mp4.manifest"
        },
        {
            "title": "Standard quality",
            "duration": "463200",
            "bitrate": "480",
            "videocodec": "H264 at 3.0",
            "framesize": "512x384",
            "audiocodec": "AAC at LC",
            "audiolang": "en",
            "default": "no",
            "src": "_mp4_h264_aac.mp4.manifest"
        },
        {
            "title": "High quality",
            "duration": "463240",
            "bitrate": "870",
            "videocodec": "H264 at 3.1",
            "framesize": "848x480",
            "audiocodec": "AAC at LC",
            "audiolang": "en",
            "default": "yes",
            "src": "_mp4_h264_aac_hq_en.mp4.manifest"
        },
        {
            "title": "High quality",
            "duration": "463240",
            "bitrate": "870",
            "videocodec": "H264 at 3.1",
            "framesize": "848x480",
            "audiocodec": "AAC at LC",
            "audiolang": "fr",
            "default": "yes",
            "src": "_mp4_h264_aac_hq_fr.mp4.manifest"
        },
        {
            "title": "High definition",
            "duration": "463200",
            "bitrate": "1710",
            "videocodec": "H264 at 3.1",
            "framesize": "1280x720",
            "audiocodec": "AAC at LC",
            "audiolang": "en",
            "default": "no",
            "src": "_mp4_h264_aac_hd.mp4.manifest"
        }
    ]
}

An example of a "format" manifest fetched from http://server2/123/456/123456_mp4_h264_aac_hq_en.mp4.manifest
(providing fragments list and RAP):

{
    "revision": "1.1",
    "base": "http://server2/123/456/123456_mp4_h264_aac_hq_en.mp4",
    "fragments":
    [
        [9920, ".f1"],
        [10040, ".f2"],
        [9960, ".f3"],
        <...>
        [3440, ".f47"]
    ]
}

Note that these files are delivered gzipped to the clients, provided they support that type of transport ("Accept-Encoding: gzip").
Also, the format of "base" and "src" parameters are completely implementation-dependant (they are linked to the way we store
the different versions of our files). ".manifest" may be replaced by ".m3u8" to get an Apple HLS-compliant manifest. The original
file is still accessible using regular HTTP GET + bytes-ranges if needed (in the example above, the file is accessible without any
transformation at http://server2/123/456/123456_mp4_h264_aac_hq_en.mp4).

We choose to re-encapsulate/re-mux the A/V samples into the clients relevant "containers" at the server side.
For Flash, it's FLV because the appendBytes() primitive expect this format, and it's really efficient as far
container overhead is concerned (we may have send re-muxed MP4 fragments, but it's more complicated to synthesize
on-the-fly, and you also need an MP4->FLV demuxing/remuxing library in the player itself, which is a little
bit overkill IMHO (it's implemented in OSMF for instance)).
Getting chunks of FLV is so much easier in Flash than getting chunks of fragmented MP4 like OSMF does.

Agreed. It's more complicated on the server, but seamless in the player.

Do you have more info on what you generate - or perhaps a test stream?

It's a regular FLV stream with H264 @ AAC samples interleaved, nothing fancy here.

And do you reset the pipe on a bitrate switch in Flash?

Yes, we reset the decoding pipe on seeking and bitrate switch. We use the provided events to signal the decoding
layer that potential new SPS/PPS (for H264) or decoding parameters (for AAC) are being sent (see [1]).

Second, I know e.g. Code-shop preferred Flash to read Fragmented MP4 over FLV because that's basically one less format to cache (better caching performance).
What do you think on this caching-versus-demuxing tradeoff?

They are right, except that isn't really a problem today since 99% of our viewers are using .. Flash Player (sorry for the shocking news ^_^).
And that's 50-60M views per day. iDevices, IPTV STB and pure-MP4 phones are merely cumulating 1% of the total traffic.

As long as the bistream format is a simple "stream of samples" (with just enough time-coded information and
decoders initialization to play those samples correctly and independently inside a particular fragment), I
think it doesn't really matter which one is chosen. I don't know enough about MKV to determine if it's a good
candidate or not. MP4 (even in its fragmented MOOF/TRAF version) is probably NOT, because of the way the samples
are referenced inside a global index: it doesn't serve any purpose in the streaming case.
What parameters are different between the various bitrates of your videos? Video bitrate/dimensions/framerate/...? Audio bitrate/samplefrequency/channels/...?

Basically yes. We have 4 base formats :

LD: MP4 / 240p H264 at 190kbps-15fps / AAC at 64kbps-22khz
SD: MP4 / 384p H264 at 360kbps-30fps / AAC at 96kbps-44khz
HQ: MP4 / 480p H264 at 700kbps-30fps / AAC at 128kbps-44khz
HD: MP4 / 720p H264 at 1500kbps-30fps / AAC at 128kbps-44khz

and derivatives for special purpose (like some IPTV STB for instance).

Pierre-Yves

[1] http://help.adobe.com/en_US/FlashPlatform/beta/reference/actionscript/3/flash/net/NetStreamAppendBytesAction.html#RESET_BEGIN

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.annodex.net/cgi-bin/mailman/private/foms/attachments/20101109/e830ff37/attachment-0001.htm 


More information about the foms mailing list