[foms] Proposal: adaptive streaming using open codecs

Mon Nov 15 20:25:46 PST 2010

On Tue, Nov 16, 2010 at 3:07 PM, Frank Galligan <fgalligan at google.com> wrote:
>
>
> On Mon, Nov 15, 2010 at 5:22 PM, Silvia Pfeiffer <silviapfeiffer1 at gmail.com>
> wrote:
>>
>> On Tue, Nov 16, 2010 at 4:49 AM, Steve Lhomme <slhomme at matroska.org>
>> wrote:
>> > On Mon, Nov 15, 2010 at 6:48 PM, Steve Lhomme <slhomme at matroska.org>
>> > wrote:
>> >> Doesn't it lead to more sync issues when the files you received are
>> >> not interleaved ? The 2 streams may not load at the same speed (once
>> >> better cached than the other for example). It also makes it harder to
>> >> estimate the current download speed... That's an edge case, but
>> >> precisely the kind of odd network behaviour that "adaptative"
>> >> streaming is meant to handle.
>> >>
>> >> One big pro for non interleaved is that switching between languages
>> >> (or regular/commentary track) is a lot easier and the only reasonable
>> >> way to handle it server side.
>> >
>> > PS: And also allows something not possible now: listen to music from
>> > video sites without having to load the video part. It's possible with
>> > RTP but the quality (on YouTube for ex) is just not there.
>>
>>
>> I believe we are optimizing for the wrong use cases by trying to
>> provide data to the Web browser in a non-interleaved manner. I would
>> not put that functionality into the adaptive HTTP streaming layer, but
>> into other technologies.
>>
>> Firstly, providing different language audio tracks to the Web browser
>> for a video can be handled at the markup level. There is work in
>> progress on this anyway because we will see video descriptions and
>> sign language video that will need to be delivered on demand in
>> parallel to the main video. I would prefer we do not try to solve this
>> problem through adaptive HTTP streaming - it seems to wrong layer to
>> get this sorted.
>
> I think this is fine. For this to work clients will have to
> take synchronized separate streams and render them at correct times. I don't
> see how this is different than rendering one video and one audio that came
> from separate streams. It shouldn't matter if the streams are referenced
> from a manifest or from the markup.

OK, it seems that at least two browser venders would be ok with
interleaving tracks retrieved through separate HTTP connections for
synchronized playback.

Would it be preferable to have all such multitrack needs for distinct
resources be solved through a manifest file - preferable over a
specification in HTML? And would it be possible then to also use that
with files coming from different servers?

I am concretely thinking about a video resource for which an audio
description exists on a different server and some Web developer wants
to publish it on a Web page in such a way that the browser will play
back the audio description in sync with the video. If we take this
away from the HTML, we may at least need a JavaScript API to
schedule/unschedule an external resource with a media resource for
playback.

Silvia.

>
>>
>> Secondly, the use case of picking up only an audio track from a video
>> is also one that can be solved differently. It requires a process on
>> the server anyway to extract the audio data from the video and then it
>> would be a user request. So, it would probably come through a media
>> fragment URI such as http://example.com/video.ogv?track=audio  which
>> would be processed by the server and an audio resource would be
>> delivered, if the service provider decides to offer such
>> functionality.
>>
> I don't see the difference between an audio language track and N different
> audio bitrate streams. We could make it a lot easier on the
> adaptive streaming clients if we said we need server-side software to
> re-mux everything on the fly. This might be feasible for larger companies
> but I think don't think it will be feasible smaller companies. Having them
> build or even manage anything other than unmodified HTTP servers is probably
> a non-starter.
>
>
>> As I have understood adaptive HTTP streaming, it is supposed to be a
>> simple implementation where the player's only additional functionality
>> is in interpreting a manifest file and switching between resource
>> chunks rather than byte ranges of a single resource.
>
> I don't think physical chunked files would necessarily be harder for a
> client to deal with over byte ranges. Actually having the client keep av
> sync on chunk boundaries might be harder. The creation tools will be a lot
> harder over what we have now too.
>
>>
>> All the decoding
>> pipeline continues to stay in tact and work as previously.
>
> The decoding pipeline shouldn't have to change that much for streams in
> separate files. At a high level you would be removing the demux step for
> streams in separate files. Worst case if the client was designed in such a
> way that it was really hard to remove the demux step you could add a simple
> muxer right before the demux step.
> I think for chunked files the decoding pipeline will have to change a good
> amount as there will have to be another step added to remove any of
> the extraneous data in the chunk to send one interleaved stream down into
> the decoding pipeline as before.
>>
>> I think we
>> should not touch the interleaved delivery functionality at this level.
>
> As I said above the decoding pipeline of a client is going to have to change
> no matter which way we go.
>
>>
>> It would cause the player to do too much synchronisation
>
> Actually with separate streams in files you would have to do just about the
> same synchronization as a client does now. The only difference being if one
> of the streams gets too far ahead of the other slow down on the download of
> that stream until it gets to an acceptable level.
> As I said above I think the client could have a tougher time doing
> the synchronization across physical boundaries of chunks.
>>
>> and network
>> delivery handling overhead work that should really not be created from
>> a single resource.
>
> I think there will be more network overhead with physical chunks than
> separate streams because you have file and stream overhead that needs to be
> replicated in each chunk.
> Frank
>>
>> Cheers,
>> Silvia.
>> _______________________________________________
>> foms mailing list
>> foms at lists.annodex.net
>> http://lists.annodex.net/cgi-bin/mailman/listinfo/foms
>
>
> _______________________________________________
> foms mailing list
> foms at lists.annodex.net
> http://lists.annodex.net/cgi-bin/mailman/listinfo/foms
>
>