[foms] Proposal: adaptive streaming using open codecs

Silvia Pfeiffer silviapfeiffer1 at gmail.com
Mon Nov 15 14:22:29 PST 2010

On Tue, Nov 16, 2010 at 4:49 AM, Steve Lhomme <slhomme at matroska.org> wrote:
> On Mon, Nov 15, 2010 at 6:48 PM, Steve Lhomme <slhomme at matroska.org> wrote:
>> Doesn't it lead to more sync issues when the files you received are
>> not interleaved ? The 2 streams may not load at the same speed (once
>> better cached than the other for example). It also makes it harder to
>> estimate the current download speed... That's an edge case, but
>> precisely the kind of odd network behaviour that "adaptative"
>> streaming is meant to handle.
>> One big pro for non interleaved is that switching between languages
>> (or regular/commentary track) is a lot easier and the only reasonable
>> way to handle it server side.
> PS: And also allows something not possible now: listen to music from
> video sites without having to load the video part. It's possible with
> RTP but the quality (on YouTube for ex) is just not there.

I believe we are optimizing for the wrong use cases by trying to
provide data to the Web browser in a non-interleaved manner. I would
not put that functionality into the adaptive HTTP streaming layer, but
into other technologies.

Firstly, providing different language audio tracks to the Web browser
for a video can be handled at the markup level. There is work in
progress on this anyway because we will see video descriptions and
sign language video that will need to be delivered on demand in
parallel to the main video. I would prefer we do not try to solve this
problem through adaptive HTTP streaming - it seems to wrong layer to
get this sorted.

Secondly, the use case of picking up only an audio track from a video
is also one that can be solved differently. It requires a process on
the server anyway to extract the audio data from the video and then it
would be a user request. So, it would probably come through a media
fragment URI such as http://example.com/video.ogv?track=audio  which
would be processed by the server and an audio resource would be
delivered, if the service provider decides to offer such

As I have understood adaptive HTTP streaming, it is supposed to be a
simple implementation where the player's only additional functionality
is in interpreting a manifest file and switching between resource
chunks rather than byte ranges of a single resource. All the decoding
pipeline continues to stay in tact and work as previously. I think we
should not touch the interleaved delivery functionality at this level.
It would cause the player to do too much synchronisation and network
delivery handling overhead work that should really not be created from
a single resource.


More information about the foms mailing list