[foms] Chunked/adaptative streaming at Dailymotion

Jeroen Wijering jeroen at longtailvideo.com
Tue Nov 9 11:23:05 PST 2010


Thanks!

That all makes total sense. Good to see a real-life setup.

- Jeroen



On Nov 9, 2010, at 2:54 PM, Pierre-Yves KEREMBELLEC wrote:

>>> This new version is a standalone event-driven HTTP server, with support for standard HTTP requests (regular
>>> and byte-ranges), Flash player (json + FLV remuxing), Apple HTTP Live Streaming (m3u8 + MPEG2-TS re-muxing).
>>> It supports the original containers from the previous version (MP4 and FLV), and we will probably support
>>> MKV (+ VP8/Vorbis) if the WebM initiative gains enough momentum.
>> I know some server modules (Code-shop, Adobe) need their MP4 files preprocessed into fragmented MP4 before playout can be done. You seem to simply use the MP4 files and do all on the fly?
> 
> Yes, because we had this huge amount of content (15M * 4 formats = 60M files) that we didn't want to remux. We choose
> to build a "dynamic" server for this reason. I'm not saying it's the best choice out there, but it fitted our needs best.
> 
>> Is that fast enough, since I presume the server has to work pretty hard to extract a fragment from an MP4 file?
> 
> Not exactly. We build an index the first time the file is fetched from the storage servers and cached at the first layer
> of streaming servers. The file is not even fetched entirely: instead, it's been cached progressively (using sparse files),
> because we only need a small portion of that file to build the index (namely the MOOV atom for MP4, whether it's at
> the beginning or at the end of the file, and that's also why the communication between the storage servers and the
> streaming servers layer is using HTTP bytes ranges (the storage servers are just "dump" HTTP 1.1 servers)).
> 
> Building the index takes less than 100ms on a busy server for a 1h-long content, and re-reading this index to build
> manifests or fragments on the fly is negligible. Most CPU is spent in disk and network I/O, reading file blocks and
> interleaving to deliver the final stream (we are using specific syscalls like splice() and vectored writes to optimize
> this part).
> 
>>> We choose to base both manifests on JSON, because it's easily parsable and virtually all platforms and
>>> languages already have JSON parsers (it's a native format in Javascript (browsers) and Actionscript (Flash),
>>> which are both ECMAScript derivatives). It's also easily extensible and not tied to any existing format
>>> (like Apple's M3U8, Microsoft/Adobe manifests files, DASH), so there no fear of infringing some vendor IP. 
>> Awesome! Do you have an example of both formats - overarching and single bitrate?
> 
> Sure, but nothing really rocket-science here, it just mimics Apple's M3U8 behavior with some small variations
> (like precise timecode instead of second-rounded ones or extensible format).
> 
> An example of a variant manifest fetched from http://server1/123/456/123456.mp4.manifest (tying formats all-together,
> all durations are in ms, live and security entries removed for brevity):
> 
> {
>     "revision": "1.1",
>     "base": "http://server2/123/456/123456",
>     "versions":
>     [
>         {
>             "title": "Low quality",
>             "duration": "463240",
>             "bitrate": "260",
>             "videocodec": "H264 at 1.0",
>             "framesize": "320x240",
>             "audiocodec": "AAC at LC",
>             "audiolang": "en",
>             "default": "no",
>             "src": "_mp4_h264_aac_ld.mp4.manifest"
>         },
>         {
>             "title": "Standard quality",
>             "duration": "463200",
>             "bitrate": "480",
>             "videocodec": "H264 at 3.0",
>             "framesize": "512x384",
>             "audiocodec": "AAC at LC",
>             "audiolang": "en",
>             "default": "no",
>             "src": "_mp4_h264_aac.mp4.manifest"
>         },
>         {
>             "title": "High quality",
>             "duration": "463240",
>             "bitrate": "870",
>             "videocodec": "H264 at 3.1",
>             "framesize": "848x480",
>             "audiocodec": "AAC at LC",
>             "audiolang": "en",
>             "default": "yes",
>             "src": "_mp4_h264_aac_hq_en.mp4.manifest"
>         },
>         {
>             "title": "High quality",
>             "duration": "463240",
>             "bitrate": "870",
>             "videocodec": "H264 at 3.1",
>             "framesize": "848x480",
>             "audiocodec": "AAC at LC",
>             "audiolang": "fr",
>             "default": "yes",
>             "src": "_mp4_h264_aac_hq_fr.mp4.manifest"
>         },
>         {
>             "title": "High definition",
>             "duration": "463200",
>             "bitrate": "1710",
>             "videocodec": "H264 at 3.1",
>             "framesize": "1280x720",
>             "audiocodec": "AAC at LC",
>             "audiolang": "en",
>             "default": "no",
>             "src": "_mp4_h264_aac_hd.mp4.manifest"
>         }
>     ]
> }
> 
> An example of a "format" manifest fetched from http://server2/123/456/123456_mp4_h264_aac_hq_en.mp4.manifest
> (providing fragments list and RAP):
> 
> {
>     "revision": "1.1",
>     "base": "http://server2/123/456/123456_mp4_h264_aac_hq_en.mp4",
>     "fragments":
>     [
>         [9920, ".f1"],
>         [10040, ".f2"],
>         [9960, ".f3"],
>         <...>
>         [3440, ".f47"]
>     ]
> }
> 
> Note that these files are delivered gzipped to the clients, provided they support that type of transport ("Accept-Encoding: gzip").
> Also, the format of "base" and "src" parameters are completely implementation-dependant (they are linked to the way we store
> the different versions of our files). ".manifest" may be replaced by ".m3u8" to get an Apple HLS-compliant manifest. The original
> file is still accessible using regular HTTP GET + bytes-ranges if needed (in the example above, the file is accessible without any
> transformation at http://server2/123/456/123456_mp4_h264_aac_hq_en.mp4).
> 
>>> We choose to re-encapsulate/re-mux the A/V samples into the clients relevant "containers" at the server side.
>>> For Flash, it's FLV because the appendBytes() primitive expect this format, and it's really efficient as far
>>> container overhead is concerned (we may have send re-muxed MP4 fragments, but it's more complicated to synthesize
>>> on-the-fly, and you also need an MP4->FLV demuxing/remuxing library in the player itself, which is a little
>>> bit overkill IMHO (it's implemented in OSMF for instance)).
>> Getting chunks of FLV is so much easier in Flash than getting chunks of fragmented MP4 like OSMF does.
> 
> Agreed. It's more complicated on the server, but seamless in the player.
> 
>> Do you have more info on what you generate - or perhaps a test stream?
> 
> It's a regular FLV stream with H264 @ AAC samples interleaved, nothing fancy here.
> 
>> And do you reset the pipe on a bitrate switch in Flash?
> 
> Yes, we reset the decoding pipe on seeking and bitrate switch. We use the provided events to signal the decoding
> layer that potential new SPS/PPS (for H264) or decoding parameters (for AAC) are being sent (see [1]).
> 
>> Second, I know e.g. Code-shop preferred Flash to read Fragmented MP4 over FLV because that's basically one less format to cache (better caching performance).
>> What do you think on this caching-versus-demuxing tradeoff?
> 
> They are right, except that isn't really a problem today since 99% of our viewers are using .. Flash Player (sorry for the shocking news ^_^).
> And that's 50-60M views per day. iDevices, IPTV STB and pure-MP4 phones are merely cumulating 1% of the total traffic.
> 
>>> As long as the bistream format is a simple "stream of samples" (with just enough time-coded information and
>>> decoders initialization to play those samples correctly and independently inside a particular fragment), I
>>> think it doesn't really matter which one is chosen. I don't know enough about MKV to determine if it's a good
>>> candidate or not. MP4 (even in its fragmented MOOF/TRAF version) is probably NOT, because of the way the samples
>>> are referenced inside a global index: it doesn't serve any purpose in the streaming case.
>> What parameters are different between the various bitrates of your videos? Video bitrate/dimensions/framerate/...? Audio bitrate/samplefrequency/channels/...?
> 
> Basically yes. We have 4 base formats :
> 
> LD: MP4 / 240p H264 at 190kbps-15fps / AAC at 64kbps-22khz
> SD: MP4 / 384p H264 at 360kbps-30fps / AAC at 96kbps-44khz
> HQ: MP4 / 480p H264 at 700kbps-30fps / AAC at 128kbps-44khz
> HD: MP4 / 720p H264 at 1500kbps-30fps / AAC at 128kbps-44khz
> 
> and derivatives for special purpose (like some IPTV STB for instance).
> 
> Pierre-Yves
> 
> [1] http://help.adobe.com/en_US/FlashPlatform/beta/reference/actionscript/3/flash/net/NetStreamAppendBytesAction.html#RESET_BEGIN
> 
> _______________________________________________
> foms mailing list
> foms at lists.annodex.net
> http://lists.annodex.net/cgi-bin/mailman/listinfo/foms



More information about the foms mailing list