[theora-dev] My issues with ogg and directshow...

Mon May 10 16:43:56 PDT 2004

----- Original Message -----
From: "Aaron Colwell" <acolwell at real.com>
To: <theora-dev at xiph.org>
Sent: Tuesday, May 11, 2004 1:37 AM
Subject: Re: [theora-dev] My issues with ogg and directshow...

<p>> On Sun, May 09, 2004 at 01:58:17PM +0800, illiminable wrote:
> >
> > ----- Original Message -----
> > From: "Timothy B. Terriberry" <tterribe at xiph.org>
> > To: <theora-dev at xiph.org>
> > Sent: Sunday, May 09, 2004 4:58 AM
> > Subject: Re: [theora-dev] My issues with ogg and directshow...
> >
> >
> > > > Ok, so given that the graph has to be built before data is passed
> > > > downstream, there is a problem. How can the demuxer know what
filters to
> > > > connect to (ie what the streams are) ? The demux needs to read ahead
> > > enough
> > > > to find the BOS pages. Now we know how many streams there are. How
does
> > > it
> > > > know what kind of streams they are ? It has to be able to recognise
the
> > > > capture patterns of every possible codec. So a "codec oblivious"
demux
> > > is
> > > > already out of the question.
> > >
> > > This is an issue of where the separation line is drawn, not whether or
not
> > > separation can exist. The Ogg abstraction has a richer interaction
between
> > > codec and muxer than the DS framework mandates. But this doesn't
prevent
> > > you from defining an "Ogg codec" interface as a richer instance of the
> > > general DS codec interface, adding such things as generic functions to
> > > answer questions like, "Given this initial packet, can you decode this
> > > stream?" or, "What are the DS media parameters corresponding to this
> > > complete set of header packets?" or, "What is the time associated with
> > > this granule position?" The muxer can still rely wholly on the codecs
to
> > > answer these questions, it just needs a richer codec API than the DS
> > > framework in general has. New codecs can still be added without
> > > modifications to the demux so long as they implement this extended
API.
> > >
> >
> > Which is how it currently works.
> >
http://svn.xiph.org/trunk/oggdsf/src/lib/core/directshow/dsfOggDemux/OggStre
> > amFactory.cpp
> >
> > But it still makes every codec to some extent dependant on ogg.
>
> It doesn't have to be this way. That is the whole point of a framework. It
> decouples these dependencies.
>
> >
> > But when you say a "codec api" this will be out of the realm of
directshow
> > interfaces. Assuming directshows model of a decoder as a filter, how can
a
> > demux know what filter to ask if it can't determine a unique GUID to
> > identify the media type. It could try to ask all 3000+ installed filters
if
> > they support a particular "ogg codecs" interface, then when they find
one,
> > ask them if they know this codec, it would work but this is not
practical.
> > This is why the media type GUID is important as it narrows the number of
> > filters it has to try to a much smaller number.
> >
> > So the other alternative is to have "out of directshow" calls to dll's.
In
> > other words require codecs to provide not only directshow filter and pin
> > interfaces, but also some other api external to directshow. Which again
is
> > possible and loks liek the route i will end up going, but it puts
another
> > requirement on codec developers to implement another API which is not
really
> > part of directshow, which kind of breaks the whole idea of an
automatically
> > buildable filter graph in directshow.
> >
>
> I don't think either of these are a good solution. When people say
> "ask the codec" I don't really think that is what they mean. Yes certain
> pieces of information are only known by the codec code, but this
> could easily be encapsulated in a library that doesn't actually
instantiate

Which is the second alternative isn't it ? Providing some external API (not
necessarily in the same dll) that provides all these extra functions. Though
yes it's true... it needn't be imlpemented by the original codec author...
it's more of an "importer" library.

> all of the stuff needed for decoding. It just knows enough to unpack the
> headers and generate a 4cc, guid, mimetype, or whatever. It also knows how
to
> convert granulepos to time for each codec. You may also choose to have it
> provide common info like sample rate, channels, image size, frame rate,
> bitrate, TAC, and a few other pieces of common info. These values are
> commonly used by media frameworks to hook all the pieces together.
>

This is what i mean by alternatieve 2. This is how it currently works... i
have an abstraction called an oggstream, and the oggstreamfactory tests the
header identifiers and returns a class derived from oggstream that has
specific knowledge of the codec. Just at the moment all this is embedded
directly in the demux code.

What i was suggesting is that when registering new codecs, as well as
registering a filter with directshow there will need to be say a registry
key with a list of codec identifier library, so when the demux can't find a
match in it's hard coded list of codecs, it tries one by one all the listed
dll's until one recognises it or all have been tested.

The big problem with doing it robustly is the variable codec identifier...
Because it makes the outcome dependant on the order in which the libraries
are called in the case where one codec identifier is prefixed by another.

What i don't see is why it woulld be any great disadvantage to for example
say that codec identifiers must be null terminated and that codec
identifiers can't contain the null character. This eliminates the
namespacing problem and gaurantees that all unique identifiers can be
determined to be unique, rather than the current system "that should work
most of the time".

>
> > And it also make integrating existing codecs which know nothing of ogg a
> > problem as they are lready implemented, they already know how to work
the
> > directshow way.  Lets say someone for example has written a directshow
> > filter for dirac (http://www.bbc.co.uk/rd/projects/dirac/overview.shtml)
for
> > example, and they have imlpemented the requirements directshow needs.
This
> > filter should be able to work with any encapsulation format. However in
> > order for it to work in ogg it will need to provide some other dll
interface
> > or the ogg demux won't be able to recognise it. Or even if the fitlers
can
> > be narrowed by guid, it will still need to offer some "ogg codecs"
interface
> > that the developer of that codec may not have implemented, nor want to
> > implement, nor really should need to implement.
>
> No it doesn't. The dumux just has to present the dirac data the same way
that
> other demux filters do. There is a contract between the demux and codec
that
> all demux objects must honor.
>

That's true but in order to be able to present that data it needs, there
still needs to be some kind of helper library.

> >
> > Certainly it will work but it defeats the whole point of using a
structured
> > media framework if you force every codec (which for the most part
doesn't
> > care about its encapsulation format) to provide information to a
particular
> > encapsulation format. If all ogg is interested in is encapsulating its
> > codecs (theora, vorbis, speex and flac)... then there is no issue, you
can
> > make these codecs as dependant on ogg as desired, but as soon as you try
and
> > integrate other codecs, you can't expect every codec to include a
special
> > interface so it can be ogg encapsulated.
>
> I don't believe this is true. If you want to carry other codecs in an ogg
file
> the demux MUST present the information about the codec in the same way
other
> demux objects would. That is the whole point. When you add a codec to a
> framework you setup a particular set of parameters and bitstream format.
Once
> that is determined then all a demux has to do is provide the data in that
same
> format. You don't need special interfaces on all the codecs.
>
> >
> > The solution that every other format i can find that is implemented on
> > directshow has chosen, is to make a blind mapping of some identifier to
a
> > guid... be it a numeric identifier or a fourcc code. Basicly what is
done is
> > (GUIDs are 128 bits) a new guid is created... in fourcc's case the first
32
> > bits become a mask... so the last 96 bits identifies a fourcc guid and
any
> > demuxer without knowing anything about the codec can create a unique
fully
> > formed guid by masking the 32 bits of the fourcc code into the first 32
> > bits. And furthermore have fixed headers providing the key information
for
> > setting up streams, ie some kind of data rate and frame size
information.
>
> I don't see why the ogg demux can't do the same thing. Just create 4cc
codes
> for the xiph codecs.
>

Well if you mean why can't the ogg demux create some unique identifer of
some sort... it does, it assigns a media type, and format type guid to each
codec... the difference is this is hard coded totalyl arbitrarily. Whereas
fourcc information is found in the header and is used in a very specific way
to create a guid. The reasons why an identifier of unknown length can't be
used for this i mentioned previously.

Just as an example, there is specific code for each codec abstracted by an
oggstream, which knows how to get key information to make a filter
connection out of a header packet. There is one of these for each codec.

ie One for speex, vorbis, theora and flac. All roughly the same amount of
code, doing basicly exactly the same thing but in a slightly different way.

In contrast, using the fixed header of ogm video, with less code than any of
the other streams, i can embed any video format recognised by ffdshow... i
think like 15 or more different codecs. Plus... it will still recognise any
new codec ffdshow can recognise by a fourcc code with no modification from
me, no need for special helper libraries. If ffdshow added another 50 codecs
tomorrow i wouldn't have to change a thing. If 50 more codecs with the
header style of vorbis speex etc were added to ogg, i'd be coding header
parsing routines well into next month.

And just to clarify this header does not replace the native header of the
codec, it just prefixes it... only the demux sees it to set up it's graph,
the codec never sees it, the first thing the codec sees is it's native
header. The difference being that the header parsing of the key information
was done initially at mux time, not required to be done dynamically at every
load time.

To me this makes sense, muxing is when you need to know how to map a codec
into the encapsulation format, all the information is at hand, and is
required to be at hand in order to do the mux. If we accept that the mux is
the more complex of the two steps, and also the fact that the mux is only
done once, the demux happens every time you play the file. It makes sense
for the mux to carry as muh burden and the demux the least. This method
allows the demux to stay less complex and have no need to be able to
understand the codecs native header format.

> >
> > Which is the ideal solution that ogg should use in directshow... the
problem
> > is all the other formats have fixed length identifiers and partialyl
fixed
> > header. So they know for example that somewhere in the header in a fixed
> > place is exactly 32 (or however many) bits of identifier that they can
use
> > blindly to create a fully formed guid.
> >
>
> Ogg is not the only container format that needs to know limited info about
the
> codecs contained in it. .MP4/.3GP files have similar needs. There is a
codec
> specific data chunk in the files that need to be parsed to provide info to
> media frameworks. The demux needs to translate these into something that
the
> framework understands. Yes this is a little more complicated than just
looking
> at byte 42 and reading a 4cc, but it allows better control over what
parameters
> are specified and allows easy extension for the future.
>
> > Which leaves a problem, if the codec identifier in ogg is not fixed
length
> > or even bounded, how can you be sure that you can create a unique fully
> > formed guid from it. You could just say... ok we'll only look at the
first x
> > bytes... but what happens to a codec who's identifier is less than x
bytes,
> > variable parts of the header will be in those x bytes making creating a
> > unique id impossible. Similarly what if two identifiers share a common
first
> > x bytes.
>
> It is the demux's responsibility to map 4cc codes or whatever the
framework
> expects to headers in the file. What the actual header length is in the
file
> doesn't matter as long as the demux knows how to do the proper
conversions.
> The demux doesn't just demultiplex a format. It may also translate the
data
> into a form that the framework or components in the framework expect.
>
> >
> > > And as an aside, please don't use the phrase "granule rate"... it
implies,
> > > incorrectly, than the granule position->time mapping can be
accomplished
> > > by multiplying by a simple scale factor, and this is NOT true in
general.
> > > In particular, it is not true for Theora.
> > >
> >
> > Maybe the terminology is not appropriate but the principle is the
same...
> > the rate at which some unit of data is presented. Be it frames, samples
or
> > other. It doesn't necessarily mean a strict multiplication. Maybe data
rate
> > or sample rate is more appropriate.
>
> Data rate or sample rate implies (x / rate = time) or
> (x / rate - offset = time) form. This is not necessarily how the
granulepos is
> converted to time.
>
> >
> > And for theora you could just consider it to be in a funny number base,
> > where the least significant "digit" is unitary and the most significant
> > "digit" represents a factor of 2^granuleshift.
>
> That is not how the Theora timestamps work. It is something more like
>
> ((granulepos >> granuleshift) +
>  (((1 << granuleshift) - 1) & granulepos)) / fps = time
>

Yeah... i couldnt' remember the exact code.

> I don't know about you, but when I hear data rate or sample rate I don't
think
> of a transformation like this to get time.
>

It depends how you think about data rate... if you try to correlate it to
granule pos, perhaps it doesn't make sense. But if you think of data rate
more abstractly as the rate at which some unit of data is presented with the
granule pos just providing a means to identify which unit it is, then it
does.

<p>> >
> > > > Directshow works in UNITS of 1/10,000,000 of a second, it knows
nothing
> > > of
> > > > granule pos. When something like media player requests a seek or a
> > > position
> > > > request it wants these units. So the seek request comes into the
graph.
> > > It
> > >
> > > Generally one seeks to a time, not a granule position. The granule
> > > position->time mapping is unique, but the reverse does not have to be.
So
> > > when dealing with multiple codecs, you convert everything to a time in
> > > order to be able to compare values among them. It's unfortunate that
DS
> > > does not let one work in the native time base of the streams, but
units of
> > > 100 nanoseconds should be accurate enough for most purposes.
> > >
> >
> > It's not so much accuracy, but mediaplayer say wants to seek to 5
seconds
> > (50,000,000 units)... so what the demux does is say here's a codec, how
do
> > turn it's granule pos nito time (again this requires all existing codecs
to
> > imlpement some kind of way to make this conversion, even if they are not
> > naturally ogg based codecs)... then compare the desired time to the
> > converted granule pos, rinse and repeat.
>
> Any timestamps that move through the framework should be in the
framework's
> units. That is part of the contract made between a filter and the
framework.
> To do otherwise is just asking for trouble. The demux should
> be the one responsible for converting granulepos to framework timestamps.
If
> you need to convey extra timestamp data through the framework then you
need
> to "packetize" that info with the data that you are sending through the
filter
> graph. Filters that understand that packetization format will do the right

But filters that don't will be oblivious to it and still beleive they can
still decode it and will still accept a connection that when the data comes
to them will appear garbled. The whole idea of filters is that they can be
nuplugged and lpugged into others. If you start to put extra information in
the codec samples then you basically limit the filters that can connect to
only those that are ogg-aware, which is an unnecessary restriction.

> thing. You can think of it as container bitstream or whatever. Doing this
could
> also help you get around the fixed buffer size requirement and allow you
to
> handle chained files.
>

Chained files can already be handled by a dynamic disconnection and
reconnection of the graph... it's a bit of a pain to do but it can be done.
Though it won't make any difference to the fixed buffers. These are setup
before any filter other than the demux has seen data, and information in the
packets is too late to do this. If a buffer is too small and a filter tries
to put data in it, the graph will termintate. Or even in the best case,
flush all data, disconnect all filters, reconnect all filter with bigger
buffers and carry on, almost gauranteeing that the data that was in the
graph is lost.

> >
> > It's not a matter of DS not allowing you to use another another
> > timescheme... you can to some extent... in fact the time stamps that
emerge
> > from the demux are currently granule pos's. Which means that any decoder
> > that conencts to them must understand granule pos (which is not realyl
> > desirable) But once they leave the decoder to the renderer (or any other
> > ogg-unaware filter) or before they are passed to any internal decoder
(the
> > actaul decoder not the filter wrapper) that doesn't know about ogg they
need
> > to be UNITS.
>
> That doesn't seem like a big problem. The demux and decoders both have
enough
> information to satisfy this requirement.
>
> >
> > It's more a matter of, the user doesn't care about granule pos, the user
> > knows seconds at some point there has to be a conversion taking place,
and
> > currently it's the demux that has to make all those conversions.
>
> I agree. The demux is the proper place to do these conversions in my
opinion.
>
> >
> > Zen.
> >
> > > --- >8 ----
> > > List archives:  http://www.xiph.org/archives/
> > > Ogg project homepage: http://www.xiph.org/ogg/
> > > To unsubscribe from this list, send a message to
> > 'theora-dev-request at xiph.org'
> > > containing only the word 'unsubscribe' in the body.  No subject is
needed.
> > > Unsubscribe messages sent to the list will be ignored/filtered.
> > >
> > >
> >
> >
> > --- >8 ----
> > List archives:  http://www.xiph.org/archives/
> > Ogg project homepage: http://www.xiph.org/ogg/
> > To unsubscribe from this list, send a message to
'theora-dev-request at xiph.org'
> > containing only the word 'unsubscribe' in the body.  No subject is
needed.
> > Unsubscribe messages sent to the list will be ignored/filtered.
> --- >8 ----
> List archives:  http://www.xiph.org/archives/
> Ogg project homepage: http://www.xiph.org/ogg/
> To unsubscribe from this list, send a message to
'theora-dev-request at xiph.org'
> containing only the word 'unsubscribe' in the body.  No subject is needed.
> Unsubscribe messages sent to the list will be ignored/filtered.
>
>

<p>--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'theora-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.