[foms] Proposal: adaptive streaming using open codecs

Wed Oct 20 03:53:07 PDT 2010

On Oct 19, 2010, at 5:53 PM, Philip Jägenstedt wrote:

>> Here is a (rough and incomplete) proposal for doing adaptive streaming using open video formats. WebM is used as an example, but all points should apply to Ogg as well. Key components are:
>> 
>> * Videos are served as separate, small chunks.
>> * Accompanying manifest files provide metadata.
>> * The user-agent parses manifests and switches between stream levels.
>> * An API provides QOS metrics and enables custom switching logic.
>> 
>> What do you think of this approach - and its rationale? Any technical issues (especially on the container side) or non-technical  objections?
> 
> Thanks for writing this up, Jeroen. Before going into inline replies, I want to state the problem with chunking on a lower level. We have two blobs of audio/video data which we want to play back-to-back gapless. From the point of view of a decoding pipeline, there are basically two options:
> 
> 1. Treat everything as an infinite stream in a single decoding pipeline, and have the demuxer handle chained Ogg or chained WebM.
> 2. Have each chunk be its own finite resource and set up a decoding pipeline for each one, having a super-pipeline coordinating those and handling audio mixing.
> 
> I believe that option 1 is a lot easier to integrate with existing media frameworks, while option 2 adds a lot of complexity. Opera doesn't only have to worry about working with GStreamer, but also about hardware devices with its own media stack where we can't easily fix stuff.
> 
> Going with option 1, we basically add the constraint that all chunks must use the same container format and that container format must be streamable and chainable. This is true of Ogg and can be made true for WebM. It's slightly less general, but probably a tradeoff worth doing.

Option 1 sounds easier indeed. This is the way Adobe/actionscript handle decoding of dynamic streams - and it works really well. One basically initates a decoding pipe, gives it a (FLV) header that tells which codecs to expect and then dump the codec data into the pipe.  Changes in dimensions, framerate, number of channels and sample frequency are no problem. The pipe will die if one starts injecting data with a different codec. 

Conrad had related questions on the specification of chunks:

> For at least Vorbis and Theora tracks in Ogg, including the headers to
> make a valid file would require a complete copy of the codebooks,
> adding ~3-4kB of overhead per track to the start of each chunk. I
> assume this also applies to Vorbis in WebM (but not VP8?).
> 
> We would also need to specify where in the available chunks global
> information goes, such as Ogg Skeleton or Matroska Chapters, and how
> (or if) to handle seek tables, cueing data etc.
> 
> It might defeat some of the point of adaptive streaming to have
> repeated information at the start of each chunk. Perhaps it would be
> cheaper to just specify that chunks are a sequence of video frames
> (ie. a sequence of pages/clusters beginning with a keyframe)?

Perhaps somebody could setup a proposal for this side of the system? Are chained Ogg / WebM / ContainerX files indeed preferred? In the case of WebM, what would they look like? Would it still be possible for "smart servers" to parse a regular Ogg/WebM file and push out a chain-chunk on the fly? The latter is important for legacy content (no need to re-encode) and for simultaneous streaming/download offerings of a single file. 

- Jeroen