[foms] Proposal: adaptive streaming using open codecs

Tue Nov 16 10:13:46 PST 2010

On Tue, Nov 16, 2010 at 5:07 AM, Frank Galligan <fgalligan at google.com> wrote:
>
>
> On Mon, Nov 15, 2010 at 5:22 PM, Silvia Pfeiffer <silviapfeiffer1 at gmail.com>
> wrote:
>>
>> On Tue, Nov 16, 2010 at 4:49 AM, Steve Lhomme <slhomme at matroska.org>
>> wrote:
>> > On Mon, Nov 15, 2010 at 6:48 PM, Steve Lhomme <slhomme at matroska.org>
>> > wrote:
>> >> Doesn't it lead to more sync issues when the files you received are
>> >> not interleaved ? The 2 streams may not load at the same speed (once
>> >> better cached than the other for example). It also makes it harder to
>> >> estimate the current download speed... That's an edge case, but
>> >> precisely the kind of odd network behaviour that "adaptative"
>> >> streaming is meant to handle.
>> >>
>> >> One big pro for non interleaved is that switching between languages
>> >> (or regular/commentary track) is a lot easier and the only reasonable
>> >> way to handle it server side.
>> >
>> > PS: And also allows something not possible now: listen to music from
>> > video sites without having to load the video part. It's possible with
>> > RTP but the quality (on YouTube for ex) is just not there.
>>
>>
>> I believe we are optimizing for the wrong use cases by trying to
>> provide data to the Web browser in a non-interleaved manner. I would
>> not put that functionality into the adaptive HTTP streaming layer, but
>> into other technologies.
>>
>> Firstly, providing different language audio tracks to the Web browser
>> for a video can be handled at the markup level. There is work in
>> progress on this anyway because we will see video descriptions and
>> sign language video that will need to be delivered on demand in
>> parallel to the main video. I would prefer we do not try to solve this
>> problem through adaptive HTTP streaming - it seems to wrong layer to
>> get this sorted.
>
> I think this is fine. For this to work clients will have to
> take synchronized separate streams and render them at correct times. I don't
> see how this is different than rendering one video and one audio that came
> from separate streams. It shouldn't matter if the streams are referenced
> from a manifest or from the markup.

I agree. The current pipeline is most (all?) video players is to split
the video and audio data read from the input stream, decode them
separately and assemble+synchronize them at the end. Having
non-interleaved data only changes the fact that the input for each
independent decoder is from a different stream. In the end there is
always buffering before the rendering. At least on the audio side. It
is also necessary for codecs that have B frames (like H264). This is
not the case for VP8 or Theora. So no buffering may be needed there at
all.

Having interleaved data only guarantees that the matching audio and
video are somewhere in the local pipe. Having non-interleaved network
streams only changes this rule. The decoding pipe is hardly changed.
And being online playback, whether it's interleaved or not, you should
always be ready to handle the case where you are waiting for data from
the network to continue playback.

Steve