[theora] Http adaptive streaming for html5
silviapfeiffer1 at gmail.com
Tue Dec 29 04:43:24 PST 2009
It would be great to get further input into the discussion from you -
if you have any suggestions for solutions (or add to the list of
issues) that would be great!
On Tue, Dec 29, 2009 at 4:50 AM, Richard Watts <rrw at kynesim.co.uk> wrote:
> Silvia Pfeiffer wrote:
>> On Mon, Dec 21, 2009 at 2:49 PM, Michael Dale <mdale at wikimedia.org> wrote:
>>> Conrad Parker wrote:
>>> Silvia did a good post outlining the major solutions to this problem of
>>> exposing the structure of a composite media source:
> Hello all,
> [ I did quite a lot of work on this problem for OTT video delivery
> for an STB manufacturer a couple of years ago and I think we have
> customers actually using the feature, so I'll wade in :-) ]
> Indeed; good post!
>> Not everyone is sold on that, but it is indeed a discussion that we
>> will need to continue to have at the WHATWG/W3C.
>> Also, I agree with Michael that we need to simplify ROE in that we
>> need to remove the "sequence" support. There will be a discussion
>> about SMIL vs ROE at some point and I really don't want SMIL or
>> anything similarly complex and unimplementable.
> My experience with trying to do this in Windows Media land is that
> you can't. I think your aims are laudable (I once shared them), but
> structured video support is in fact a crawling horror and I think
> you just have to live with it.
> MS's solution to this, which is actually the least worst I've come
> across, is to allow you to specify a structured video file as any
> video resource.
Are you talking about the smooth streaming solution for silverlight
and iis7? I actually also quite like it, since it doesn't require
changes to the way in which HTTP works and servers only need to add an
additional description file. Those SMIL-like files (ism files, see
http://msdn.microsoft.com/en-us/library/ee230810.aspx) look sane.
> That structured video file then specifies a fairly traditional DAG
> of video resources (in fact, MS's is a tree but there's no need for
> it to be), annotated with various information to help you choose
> which of the alternates to play.
DAG as in directed acyclic graphs?
> ASX (the format I was using) is a ghastly nightmare of a format,
> but the approach is, I think, broadly correct.
The smooth streaming seems to use the MPEG-4 file format as the basis
(as does Apple's approach). Are you talking about an older
specification of MSs?
> It has a few nice
> * It allows the client to discover just enough about a video resource
> to play it.
> * It allows the client to buffer video segment switches across
> trickplay transitions (especially reverse trickplay)
> * It allows servers to load-balance or do ad carousel substitutions
> by forcing clients to re-request sub-playlists.
> * If you use a DAG with global URIs, clients can reuse previously
> buffered objects multiple times (so: only store advert X once, but
> play it in every YouTube stream)
> * The playlist has a fairly conventional DOM which can be walked by
> application-specific tags and URLs for particular web apps.
> * The informal separation is that:
> -> The browser reads the playlist.
> -> The video player reads the metadata in the video resource.
> -> The video decoder decodes the video.
> Video information flows down that list, and user actions
> (pause, play, got to the end of a file) up it, so:
> * The browser decides which streams to play.
> * The video player decides which codecs it needs to play them.
> * The decoder decodes the video elements.
> * The user hits 'pause'.
> * Decoder pauses, notifies player.
> * Player pauses fetch. Tells browser.
> * Browser fires its JS events.
> The client-side processing for these playlists is _horrible_, but
> it is at least possible, in a way that it simply isn't with many other
> Another thing you will want to do is to allow the _server_ to
> switch streams mid-delivery - this is a matter for the transport
> property, but could possibly be done by returning an
> X-Stream-URI: < .. .>
> As with all TCP streaming, for acceptable quality, it is absolutely
> vital to stream at exactly the right bitrate lest you be bitten
> by TCP's poor reaction to even momentary congestion. Some routers
> can be very unforgiving: you need both time of flight and bulk data
> measurements (indeed, smoothed bulk data measurements) to keep your
> buffers sized correctly - too much buffer and your channel change times
> go through the root, too little and you jerk.
Yeah, I think that has been the problem with traditional bitrate based
switching approaches on HTTP.
Also, I am worried about the setup time for decoding for different
bitrate-encoded web resources - loading each for the first time will
create quite an overhead - other times should be faster.
>> The multitrack audio and video discussion has started happening at
>> W3C/WHATWG (can't remember which group it was now), but I have seen
>> huge push-back from browser developers, in particular where the files
>> come from different servers. It seems it's just too complex at this
>> point in time.
> Tell them to try harder :-). It's a bit nasty, but by no means
> impossible if you think about the problem in a fairly structured way.
> Particularly if you have both CPU and memory, which modern browser
> developers do.
There are at least two browser vendor developers on this list here. :)
>> Also, we need to be careful about mixing too many things into ROE: I
>> would advise against doing dynamically switched content *and*
>> specification of stream composition (text tracks and how they relate
>> to each other and to the a/v) through ROE
> So would I, but I think it's unavoidable. Apart from anything else,
> if I'm going to have my video communicate with my HTML, the only sane
> way to do it is to put an 'event track' on the video and I am going
> to need to know which event track goes with which bit of my video.
What do you call an "event track"? We have no such thing in Ogg.
> If HTML5 doesn't specify a simple way to do this, people will
> and (b) immense webpages - apart from anything else, much of this
> discussion is about getting your timing right, and timing is one thing
> (on which note, you will notice that getting video-embedded events
> dimensions :-))
>> I am continuing to think
>> hard about what could be a solution for accessibility for HTML5 video,
>> because there are so many interests pushing in different directions. I
>> only know it has to be done in a really simple way, otherwise we won't
>> get it implemented.
> I think you should be able to get away with most of your accessibility
> in audio/subtitle tracks and stick anything else in event tracks and
> delegate the rest to JS?
>> As for dynamically switched content: What speaks against using Apple's
>> live streaming draft
> My first objection is the $2.50/unit I need to pay to the MPEG-LA
> for the MPEG 2 Systems licence to package my video as TS/PS.
Ah yes, that would be a big problem. Let's instead do something that
every codec an follow and isn't covered by license fees yet.
> My second is that many hardware demuxes find it extremely
> difficult to cope with PATs and PMTs following each other directly
> and with them immediately preceeding an Iframe. It's going to make
> life hard for STBs and anyone else who doesn't have the computing
> power to do everything software. In practice, you will get a
> frame skip every time you go over a file transition.
Yeah - the reason why I think I like the Silverlight smooth streaming
> My third is that it introduces yet another file format to a program
> that really doesn't need one (and it's not visible to JS either).
> What's wrong with XML?
M3U? yeah, I supposed a XML based one would be nicer, but OTOH files
are simple and thus conversion/parsing isn't hard.
> S6.1: dividing streams into trickplayable segments - bear in mind
> that it is an extremely difficult problem for H.264 streams. Best
> left to the client (or supply a separate track indicating the trickplay
> points) - since the client almost by definition has the code and the
> server probably doesn't.
> How do you name your alternate playlists?
> Other than that, it doesn't seem offensively bad. I'd invent
> something else if it became a standard though.
>> I also think we need a playlist file format for HTML5. It should be
>> acceptable as a resource sequence for the video or audio element -
>> i.e. a playlist should be either a audio playlist or a video playlist,
>> but not mixed. Also, I think it would be preferable if all the videos
>> in the playlist were encoded with the same codec, i.e. all Ogg
>> Theora/Vorbis or all MP4 H.264/AAC. Further, it would be preferable if
>> all the videos in a playlist had the same track composition. But this
>> is where it becomes really difficult and unrealistic. So, I worry
>> about how to expose the track composition of videos for a playlist.
>> Wouldn't really want to load a xspf playlist that requires a ROE file
>> for each video to be loaded to understand what's in the resource. That
>> would be a double loading need. Maybe the playlist functionality needs
>> to become a native part of HTML5, too?
> Again, laudable aims, but given the multiplicity of video formats out
> there and what people will actually do with them I seriously doubt that
> anyone will keep to that kind of a spec - the extensions will then
> become de-facto standards and you might as well not have burdened them
> with the original standard.
So, what is your suggestion instead?
More information about the theora