[theora-dev] My issues with ogg and directshow...

Mon May 10 10:37:52 PDT 2004

On Sun, May 09, 2004 at 01:58:17PM +0800, illiminable wrote:
> 
> ----- Original Message -----
> From: "Timothy B. Terriberry" <tterribe at xiph.org>
> To: <theora-dev at xiph.org>
> Sent: Sunday, May 09, 2004 4:58 AM
> Subject: Re: [theora-dev] My issues with ogg and directshow...
> 
> 
> > > Ok, so given that the graph has to be built before data is passed
> > > downstream, there is a problem. How can the demuxer know what filters to
> > > connect to (ie what the streams are) ? The demux needs to read ahead
> > enough
> > > to find the BOS pages. Now we know how many streams there are. How does
> > it
> > > know what kind of streams they are ? It has to be able to recognise the
> > > capture patterns of every possible codec. So a "codec oblivious" demux
> > is
> > > already out of the question.
> >
> > This is an issue of where the separation line is drawn, not whether or not
> > separation can exist. The Ogg abstraction has a richer interaction between
> > codec and muxer than the DS framework mandates. But this doesn't prevent
> > you from defining an "Ogg codec" interface as a richer instance of the
> > general DS codec interface, adding such things as generic functions to
> > answer questions like, "Given this initial packet, can you decode this
> > stream?" or, "What are the DS media parameters corresponding to this
> > complete set of header packets?" or, "What is the time associated with
> > this granule position?" The muxer can still rely wholly on the codecs to
> > answer these questions, it just needs a richer codec API than the DS
> > framework in general has. New codecs can still be added without
> > modifications to the demux so long as they implement this extended API.
> >
> 
> Which is how it currently works.
> http://svn.xiph.org/trunk/oggdsf/src/lib/core/directshow/dsfOggDemux/OggStre
> amFactory.cpp
> 
> But it still makes every codec to some extent dependant on ogg.

It doesn't have to be this way. That is the whole point of a framework. It
decouples these dependencies.

> 
> But when you say a "codec api" this will be out of the realm of directshow
> interfaces. Assuming directshows model of a decoder as a filter, how can a
> demux know what filter to ask if it can't determine a unique GUID to
> identify the media type. It could try to ask all 3000+ installed filters if
> they support a particular "ogg codecs" interface, then when they find one,
> ask them if they know this codec, it would work but this is not practical.
> This is why the media type GUID is important as it narrows the number of
> filters it has to try to a much smaller number.
> 
> So the other alternative is to have "out of directshow" calls to dll's. In
> other words require codecs to provide not only directshow filter and pin
> interfaces, but also some other api external to directshow. Which again is
> possible and loks liek the route i will end up going, but it puts another
> requirement on codec developers to implement another API which is not really
> part of directshow, which kind of breaks the whole idea of an automatically
> buildable filter graph in directshow.
> 

I don't think either of these are a good solution. When people say 
"ask the codec" I don't really think that is what they mean. Yes certain 
pieces of information are only known by the codec code, but this 
could easily be encapsulated in a library that doesn't actually instantiate 
all of the stuff needed for decoding. It just knows enough to unpack the 
headers and generate a 4cc, guid, mimetype, or whatever. It also knows how to
convert granulepos to time for each codec. You may also choose to have it 
provide common info like sample rate, channels, image size, frame rate, 
bitrate, TAC, and a few other pieces of common info. These values are 
commonly used by media frameworks to hook all the pieces together.

<p>> And it also make integrating existing codecs which know nothing of ogg a
> problem as they are lready implemented, they already know how to work the
> directshow way.  Lets say someone for example has written a directshow
> filter for dirac (http://www.bbc.co.uk/rd/projects/dirac/overview.shtml) for
> example, and they have imlpemented the requirements directshow needs. This
> filter should be able to work with any encapsulation format. However in
> order for it to work in ogg it will need to provide some other dll interface
> or the ogg demux won't be able to recognise it. Or even if the fitlers can
> be narrowed by guid, it will still need to offer some "ogg codecs" interface
> that the developer of that codec may not have implemented, nor want to
> implement, nor really should need to implement.

No it doesn't. The dumux just has to present the dirac data the same way that
other demux filters do. There is a contract between the demux and codec that
all demux objects must honor.

> 
> Certainly it will work but it defeats the whole point of using a structured
> media framework if you force every codec (which for the most part doesn't
> care about its encapsulation format) to provide information to a particular
> encapsulation format. If all ogg is interested in is encapsulating its
> codecs (theora, vorbis, speex and flac)... then there is no issue, you can
> make these codecs as dependant on ogg as desired, but as soon as you try and
> integrate other codecs, you can't expect every codec to include a special
> interface so it can be ogg encapsulated.

I don't believe this is true. If you want to carry other codecs in an ogg file
the demux MUST present the information about the codec in the same way other
demux objects would. That is the whole point. When you add a codec to a
framework you setup a particular set of parameters and bitstream format. Once
that is determined then all a demux has to do is provide the data in that same
format. You don't need special interfaces on all the codecs.

> 
> The solution that every other format i can find that is implemented on
> directshow has chosen, is to make a blind mapping of some identifier to a
> guid... be it a numeric identifier or a fourcc code. Basicly what is done is
> (GUIDs are 128 bits) a new guid is created... in fourcc's case the first 32
> bits become a mask... so the last 96 bits identifies a fourcc guid and any
> demuxer without knowing anything about the codec can create a unique fully
> formed guid by masking the 32 bits of the fourcc code into the first 32
> bits. And furthermore have fixed headers providing the key information for
> setting up streams, ie some kind of data rate and frame size information.

I don't see why the ogg demux can't do the same thing. Just create 4cc codes
for the xiph codecs.

> 
> Which is the ideal solution that ogg should use in directshow... the problem
> is all the other formats have fixed length identifiers and partialyl fixed
> header. So they know for example that somewhere in the header in a fixed
> place is exactly 32 (or however many) bits of identifier that they can use
> blindly to create a fully formed guid.
> 

Ogg is not the only container format that needs to know limited info about the
codecs contained in it. .MP4/.3GP files have similar needs. There is a codec
specific data chunk in the files that need to be parsed to provide info to
media frameworks. The demux needs to translate these into something that the
framework understands. Yes this is a little more complicated than just looking
at byte 42 and reading a 4cc, but it allows better control over what parameters
are specified and allows easy extension for the future.

> Which leaves a problem, if the codec identifier in ogg is not fixed length
> or even bounded, how can you be sure that you can create a unique fully
> formed guid from it. You could just say... ok we'll only look at the first x
> bytes... but what happens to a codec who's identifier is less than x bytes,
> variable parts of the header will be in those x bytes making creating a
> unique id impossible. Similarly what if two identifiers share a common first
> x bytes.

It is the demux's responsibility to map 4cc codes or whatever the framework
expects to headers in the file. What the actual header length is in the file
doesn't matter as long as the demux knows how to do the proper conversions.
The demux doesn't just demultiplex a format. It may also translate the data 
into a form that the framework or components in the framework expect.

> 
> > And as an aside, please don't use the phrase "granule rate"... it implies,
> > incorrectly, than the granule position->time mapping can be accomplished
> > by multiplying by a simple scale factor, and this is NOT true in general.
> > In particular, it is not true for Theora.
> >
> 
> Maybe the terminology is not appropriate but the principle is the same...
> the rate at which some unit of data is presented. Be it frames, samples or
> other. It doesn't necessarily mean a strict multiplication. Maybe data rate
> or sample rate is more appropriate.

Data rate or sample rate implies (x / rate = time) or 
(x / rate - offset = time) form. This is not necessarily how the granulepos is
converted to time.

> 
> And for theora you could just consider it to be in a funny number base,
> where the least significant "digit" is unitary and the most significant
> "digit" represents a factor of 2^granuleshift.

That is not how the Theora timestamps work. It is something more like

((granulepos >> granuleshift) + 
 (((1 << granuleshift) - 1) & granulepos)) / fps = time

I don't know about you, but when I hear data rate or sample rate I don't think
of a transformation like this to get time.

> 
> > > Directshow works in UNITS of 1/10,000,000 of a second, it knows nothing
> > of
> > > granule pos. When something like media player requests a seek or a
> > position
> > > request it wants these units. So the seek request comes into the graph.
> > It
> >
> > Generally one seeks to a time, not a granule position. The granule
> > position->time mapping is unique, but the reverse does not have to be. So
> > when dealing with multiple codecs, you convert everything to a time in
> > order to be able to compare values among them. It's unfortunate that DS
> > does not let one work in the native time base of the streams, but units of
> > 100 nanoseconds should be accurate enough for most purposes.
> >
> 
> It's not so much accuracy, but mediaplayer say wants to seek to 5 seconds
> (50,000,000 units)... so what the demux does is say here's a codec, how do
> turn it's granule pos nito time (again this requires all existing codecs to
> imlpement some kind of way to make this conversion, even if they are not
> naturally ogg based codecs)... then compare the desired time to the
> converted granule pos, rinse and repeat.

Any timestamps that move through the framework should be in the framework's 
units. That is part of the contract made between a filter and the framework.
To do otherwise is just asking for trouble. The demux should
be the one responsible for converting granulepos to framework timestamps. If
you need to convey extra timestamp data through the framework then you need
to "packetize" that info with the data that you are sending through the filter
graph. Filters that understand that packetization format will do the right
thing. You can think of it as container bitstream or whatever. Doing this could
also help you get around the fixed buffer size requirement and allow you to
handle chained files.

> 
> It's not a matter of DS not allowing you to use another another
> timescheme... you can to some extent... in fact the time stamps that emerge
> from the demux are currently granule pos's. Which means that any decoder
> that conencts to them must understand granule pos (which is not realyl
> desirable) But once they leave the decoder to the renderer (or any other
> ogg-unaware filter) or before they are passed to any internal decoder (the
> actaul decoder not the filter wrapper) that doesn't know about ogg they need
> to be UNITS.

That doesn't seem like a big problem. The demux and decoders both have enough
information to satisfy this requirement.

> 
> It's more a matter of, the user doesn't care about granule pos, the user
> knows seconds at some point there has to be a conversion taking place, and
> currently it's the demux that has to make all those conversions.

I agree. The demux is the proper place to do these conversions in my opinion.

> 
> Zen.
> 
> > --- >8 ----
> > List archives:  http://www.xiph.org/archives/
> > Ogg project homepage: http://www.xiph.org/ogg/
> > To unsubscribe from this list, send a message to
> 'theora-dev-request at xiph.org'
> > containing only the word 'unsubscribe' in the body.  No subject is needed.
> > Unsubscribe messages sent to the list will be ignored/filtered.
> >
> >
> 
> 
> --- >8 ----
> List archives:  http://www.xiph.org/archives/
> Ogg project homepage: http://www.xiph.org/ogg/
> To unsubscribe from this list, send a message to 'theora-dev-request at xiph.org'
> containing only the word 'unsubscribe' in the body.  No subject is needed.
> Unsubscribe messages sent to the list will be ignored/filtered.
--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'theora-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.