[theora-dev] My issues with ogg and directshow...

illiminable ogg at illiminable.com
Sat May 8 22:58:17 PDT 2004



----- Original Message -----
From: "Timothy B. Terriberry" <tterribe at xiph.org>
To: <theora-dev at xiph.org>
Sent: Sunday, May 09, 2004 4:58 AM
Subject: Re: [theora-dev] My issues with ogg and directshow...

<p>> > Ok, so given that the graph has to be built before data is passed
> > downstream, there is a problem. How can the demuxer know what filters to
> > connect to (ie what the streams are) ? The demux needs to read ahead
> enough
> > to find the BOS pages. Now we know how many streams there are. How does
> it
> > know what kind of streams they are ? It has to be able to recognise the
> > capture patterns of every possible codec. So a "codec oblivious" demux
> is
> > already out of the question.
>
> This is an issue of where the separation line is drawn, not whether or not
> separation can exist. The Ogg abstraction has a richer interaction between
> codec and muxer than the DS framework mandates. But this doesn't prevent
> you from defining an "Ogg codec" interface as a richer instance of the
> general DS codec interface, adding such things as generic functions to
> answer questions like, "Given this initial packet, can you decode this
> stream?" or, "What are the DS media parameters corresponding to this
> complete set of header packets?" or, "What is the time associated with
> this granule position?" The muxer can still rely wholly on the codecs to
> answer these questions, it just needs a richer codec API than the DS
> framework in general has. New codecs can still be added without
> modifications to the demux so long as they implement this extended API.
>

Which is how it currently works.
http://svn.xiph.org/trunk/oggdsf/src/lib/core/directshow/dsfOggDemux/OggStre
amFactory.cpp

But it still makes every codec to some extent dependant on ogg.

But when you say a "codec api" this will be out of the realm of directshow
interfaces. Assuming directshows model of a decoder as a filter, how can a
demux know what filter to ask if it can't determine a unique GUID to
identify the media type. It could try to ask all 3000+ installed filters if
they support a particular "ogg codecs" interface, then when they find one,
ask them if they know this codec, it would work but this is not practical.
This is why the media type GUID is important as it narrows the number of
filters it has to try to a much smaller number.

So the other alternative is to have "out of directshow" calls to dll's. In
other words require codecs to provide not only directshow filter and pin
interfaces, but also some other api external to directshow. Which again is
possible and loks liek the route i will end up going, but it puts another
requirement on codec developers to implement another API which is not really
part of directshow, which kind of breaks the whole idea of an automatically
buildable filter graph in directshow.

And it also make integrating existing codecs which know nothing of ogg a
problem as they are lready implemented, they already know how to work the
directshow way.  Lets say someone for example has written a directshow
filter for dirac (http://www.bbc.co.uk/rd/projects/dirac/overview.shtml) for
example, and they have imlpemented the requirements directshow needs. This
filter should be able to work with any encapsulation format. However in
order for it to work in ogg it will need to provide some other dll interface
or the ogg demux won't be able to recognise it. Or even if the fitlers can
be narrowed by guid, it will still need to offer some "ogg codecs" interface
that the developer of that codec may not have implemented, nor want to
implement, nor really should need to implement.

Certainly it will work but it defeats the whole point of using a structured
media framework if you force every codec (which for the most part doesn't
care about its encapsulation format) to provide information to a particular
encapsulation format. If all ogg is interested in is encapsulating its
codecs (theora, vorbis, speex and flac)... then there is no issue, you can
make these codecs as dependant on ogg as desired, but as soon as you try and
integrate other codecs, you can't expect every codec to include a special
interface so it can be ogg encapsulated.

The solution that every other format i can find that is implemented on
directshow has chosen, is to make a blind mapping of some identifier to a
guid... be it a numeric identifier or a fourcc code. Basicly what is done is
(GUIDs are 128 bits) a new guid is created... in fourcc's case the first 32
bits become a mask... so the last 96 bits identifies a fourcc guid and any
demuxer without knowing anything about the codec can create a unique fully
formed guid by masking the 32 bits of the fourcc code into the first 32
bits. And furthermore have fixed headers providing the key information for
setting up streams, ie some kind of data rate and frame size information.

Which is the ideal solution that ogg should use in directshow... the problem
is all the other formats have fixed length identifiers and partialyl fixed
header. So they know for example that somewhere in the header in a fixed
place is exactly 32 (or however many) bits of identifier that they can use
blindly to create a fully formed guid.

Which leaves a problem, if the codec identifier in ogg is not fixed length
or even bounded, how can you be sure that you can create a unique fully
formed guid from it. You could just say... ok we'll only look at the first x
bytes... but what happens to a codec who's identifier is less than x bytes,
variable parts of the header will be in those x bytes making creating a
unique id impossible. Similarly what if two identifiers share a common first
x bytes.

> And as an aside, please don't use the phrase "granule rate"... it implies,
> incorrectly, than the granule position->time mapping can be accomplished
> by multiplying by a simple scale factor, and this is NOT true in general.
> In particular, it is not true for Theora.
>

Maybe the terminology is not appropriate but the principle is the same...
the rate at which some unit of data is presented. Be it frames, samples or
other. It doesn't necessarily mean a strict multiplication. Maybe data rate
or sample rate is more appropriate.

And for theora you could just consider it to be in a funny number base,
where the least significant "digit" is unitary and the most significant
"digit" represents a factor of 2^granuleshift.

> > Directshow works in UNITS of 1/10,000,000 of a second, it knows nothing
> of
> > granule pos. When something like media player requests a seek or a
> position
> > request it wants these units. So the seek request comes into the graph.
> It
>
> Generally one seeks to a time, not a granule position. The granule
> position->time mapping is unique, but the reverse does not have to be. So
> when dealing with multiple codecs, you convert everything to a time in
> order to be able to compare values among them. It's unfortunate that DS
> does not let one work in the native time base of the streams, but units of
> 100 nanoseconds should be accurate enough for most purposes.
>

It's not so much accuracy, but mediaplayer say wants to seek to 5 seconds
(50,000,000 units)... so what the demux does is say here's a codec, how do
turn it's granule pos nito time (again this requires all existing codecs to
imlpement some kind of way to make this conversion, even if they are not
naturally ogg based codecs)... then compare the desired time to the
converted granule pos, rinse and repeat.

It's not a matter of DS not allowing you to use another another
timescheme... you can to some extent... in fact the time stamps that emerge
from the demux are currently granule pos's. Which means that any decoder
that conencts to them must understand granule pos (which is not realyl
desirable) But once they leave the decoder to the renderer (or any other
ogg-unaware filter) or before they are passed to any internal decoder (the
actaul decoder not the filter wrapper) that doesn't know about ogg they need
to be UNITS.

It's more a matter of, the user doesn't care about granule pos, the user
knows seconds at some point there has to be a conversion taking place, and
currently it's the demux that has to make all those conversions.

Zen.

> --- >8 ----
> List archives:  http://www.xiph.org/archives/
> Ogg project homepage: http://www.xiph.org/ogg/
> To unsubscribe from this list, send a message to
'theora-dev-request at xiph.org'
> containing only the word 'unsubscribe' in the body.  No subject is needed.
> Unsubscribe messages sent to the list will be ignored/filtered.
>
>

<p>--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'theora-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.



More information about the Theora-dev mailing list