[foms] Proposal: adaptive streaming using open codecs

Fri Nov 12 10:29:06 PST 2010

Hello all,

I think we need to take a step back and figure out how files should be
physically laid out, before we define the API to render HTTP
adaptive content. I.E. Chunked or not, interleaved or not.

With the proposal API below we are implying that to do
client adaptive streaming, the content creator must use interleaved chunks,
which I think we talked about before and the consensus was not
the preferred choice.

Here are how I view the file choices, let me know if something is wrong.

1. Chunks are discrete files with muxed streams of different types.
    Pros:
    - Should be fairly easy from a manifest and client api perspective.

    Cons:
    - Maintainability of files will be much harder.
    - Number of file combinations can grow very large with more streams

2. Chunks are discrete files with separate files for all streams.
    Pros:
    - Should be fairly easy from a manifest and client api perspective.
    - Number of file combinations will equal number of streams.

    Cons:
    - Maintainability of files will be much harder.

3. Chunks are virtual within files with separate files for all streams.
    Pros:
    - Maintainability of files will be easier.
    - Number of file combinations will equal number of streams.

    Cons:
    - Manifest and client api perspective would most likely be harder

4. Chunks are virtual within files with muxed streams of different types.
    Pros:
    - Maintainability of files will be easier.

    Cons:
    - Manifest and client api perspective would most likely be hardest
    - Number of file combinations can grow very large with more streams

I was lumping chained files in with virtual chunks/full files. If someone
thinks they should have their own category please let me know why.

I think we should be leaning towards #3. All of the streams should not be
interleaved. Even with only audio and video the number of stream
combinations can grow very large when you start adding other languages.
And eventually we will be adding other streams too. For chunked vs full
file, I'm just going with what content creators on this list have already
said.

Once we have a preliminary choice on the how the files should be laid out,
then I think we can work on the API.

Frank

On Tue, Nov 9, 2010 at 4:22 AM, Philip Jägenstedt <philipj at opera.com> wrote:

> On Mon, 08 Nov 2010 15:39:59 +0100, Jeroen Wijering
> <jeroen at longtailvideo.com> wrote:
>
> > Hello all,
> >
> >>> I don't like the idea of an appendVideo() function which specifies a
> >>> byte range because:
> >>>     • I think that browsers wouldn't want to initiate a new byte range
> >>> request for every appendVideo() call anyway, they'd want to request
> >>> the entire media (probably with a startOffset to EOF BRR), in order to
> >>> reduce delays and overhead caused by setting up many smallish BRR.
> >>>     • If appendVideo() could specify a byte range, it could specify an
> >>> invalid range, or a range which doesn't chronologically follow the
> >>> previously appended range. Everyone who wants to do manual stream
> >>> switching has to get this right, but if it's implemented in-browser,
> >>> it only needs to be gotten right once.
> >>>     • It's easier for a browser to get the transition between two
> streams
> >>> consistently seamless than it is for user JS to, since we have more
> >>> information and more control over the network and decode.
> >>>
> >>> I'd much rather that the browser fetched the indexes of all streams on
> >>> startup (probably asynchronously) and played a default stream as per
> >>> normal. If manual switching was enabled, then upon a call to
> >>> setLevel(), the video would switch to the requested stream at an
> >>> appropriate time. We'd probably want to start rendering the new stream
> >>> as soon as possible, as we could be switching down due to dropping too
> >>> many frames due to the decode not keeping up with rendering.
> >>
> >> I agree there's a certain amount of ambiguity between lower level
> >> range-requests and the un-availability of seekpoints and decoder
> >> settings. What about a slightly modified version of appendVideo()?
> >>
> >> videoElement.appendVideo(videoURL,[startPosition,endPosition]);
> >>
> >> *) Both the startPosition and endPosition are in seconds and optional.
> >> If they are both omitted, this function is basically a way to build
> >> seamless playlist playback (as Silvia requested).
> >> *) Like with today's playback,  the browser decides to probe the header
> >> data if it doesn't have enough info (yet) on the file. In other words,
> >> when building a bitrate switching API with this, the browser fetches
> >> the index the first time (part of) this videoURL is requested.
> >> *) The javascript layer needs to take care of knowing all seekpoints if
> >> bitrate switching is built with this logic. The browser will merely
> >> translate the startPosition and endPosition to the nearest seekpoints
> >> (no partial GOP decode voodoo for now).
> >> *) As the appended video metadata is fetched and the chunk duration is
> >> known, the ''duration'' and the "seekable" properties of the stream
> >> change. A "durationchange" event is fired. The ''readyState'' is
> >> re-evaluated (it may e.g. jump from HAVE_FUTURE_DATA to
> >> HAVE_CURRENT_DATA)
> >> *) As playback rolls into the new videoURL, the ''videoWidth'' and
> >> ''videoHeight'' of the stream are updated (though they may remain the
> >> same).
> >> *) If the metadata of the appended video cannot be probed, the browser
> >> throws an error (MEDIA_ERR_SRC_NOT_SUPPORTED) and does not append the
> >> video. This means that, during append-metadata-fetching, the
> >> "readyState" of a video does not change.
> >> *) Appended videos should be in the same container format and A/V codec
> >> than the currently playing video. If not, the browser will throw an
> >> error and not append the video (MEDIA_ERR_APPEND_NOT_ALIGNED). This
> >> restriction is added to ensure appended content can be decoded within a
> >> single pipeline.
> >> *) Videostream bitrate, dimensions and framerate may vary. Audiostream
> >> bitrate, channels and samplefrequency may also vary. A single decoding
> >> pipeline can handle these variations.
> >> *) Buffering and metadata availability will still happen inside the
> >> browser. When an appendVideo() is fired, the browser will typically
> >> rush to load the metadata ánd the media time range (as it does today).
> >>
> >> This seems to be a good tradeoff and starting point to experiment with
> >> adaptive streaming?
>
> If we are trying to make something simple to start experimenting with, I
> don't think we need to make all of these changes to the HTMLMediaElement
> API. Having the media backend be aware of several resources simultaneously
> complicates things and will be more difficult (or impossible) to support
> on platforms where we don't have full control over the media framework.
>
> I still strongly prefer to put the chunking+concatenating in a layer
> outside of <video>, by extending the suitably named Stream API, see
> <
> http://www.whatwg.org/specs/web-apps/current-work/multipage/commands.html#stream-api
> >.
>
> Here's how it might work:
>
> var s = new Stream();
> s.appendURL('chunk1.webm');
> s.appendURL('chunk2.webm');
> video.src = s.url;
>
> That's it. From the point of view of the video decoder, it's just an
> infinite stream. Properties:
>
> * Running out of chunks is seen as a network stall and notified via the
> stalled event.
>
> * Failure to build a correct stream is a decoding error, notified via an
> error event and MEDIA_ERR_DECODE, there's no need for
> MEDIA_ERR_APPEND_NOT_ALIGNED.
>
> * No startPosition or endPosition needed.
>
> * buffered, seekable, videoWith, videoHeight, etc are updates just as they
> would be with an infinite stream over HTTP.
>
> * It's up to the application if they want to have chained WebM/Ogg (with
> full headers in each chunk), or just do raw chunks.
>
> Clearly this is not where we want to eventually end up, but I think it's a
> good starting point. When we continue building with manifests and whatnot,
> I think we should continue to make sure that the decoding side doesn't
> need to be too involved.
>
> > I'd also like to propose to re-enstate the videoElement.PROGRESS event
> > along with its "total" and "loaded" values. What do people from
> > Opera/Firefox think of this? I know it's been removed from the spec
> > because it bites with the buffered/player TimeRanges, but it ís exactly
> > what's needed to perform all kinds of bandwidth calculations. Perphaps a
> > tradeoff is to ping the offset/progress/total of the currently being /
> > last fetched timerange? That's what we do with Jw Player in Flash as
> > well for HTTP pseudo-streaming.
>
> I don't want this to come back, the interface makes no sense. If the data
> must be made available, we should bring back the bufferedBytes property,
> which was removed from the spec by my own request since it didn't seem to
> fill any purpose, see
> <http://html5.org/tools/web-apps-tracker?from=2404&to=2405>.
>
> > I'll also wrap this up - together with the appendVideo() call - and send
> > an email to WhatWG. Hopefully one or more browser developers are
> > interested in taking a stab at this.
>
> So, I'm not a fan of appendVideo, I want the first generation of this API
> to be in the network layer, not in the <video> layer, especially not going
> all the way down to the decoding pipeline.
>
> --
> Philip Jägenstedt
> Core Developer
> Opera Software
> _______________________________________________
> foms mailing list
> foms at lists.annodex.net
> http://lists.annodex.net/cgi-bin/mailman/listinfo/foms
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.annodex.net/cgi-bin/mailman/private/foms/attachments/20101112/d02c237e/attachment.htm