[theora-dev] My issues with ogg and directshow...

Timothy B. Terriberry tterribe at vt.edu
Mon May 10 12:32:44 PDT 2004

> Which is how it currently works.
> http://svn.xiph.org/trunk/oggdsf/src/lib/core/directshow/dsfOggDemux/OggStre
> amFactory.cpp

Good, you had me worried, but I hadn't read the code.

> But it still makes every codec to some extent dependant on ogg.

Yes, this dependency is somehow mandated. Every codec that wishes to be 
embedded in Ogg must define a mapping for how that is done. It must 
define what format its headers take, how things are split into packets, 
what the granule position means, etc. You can't have a codec that is 
unaware of Ogg embedded in it.

> But when you say a "codec api" this will be out of the realm of directshow
> interfaces. Assuming directshows model of a decoder as a filter, how can a
> demux know what filter to ask if it can't determine a unique GUID to
> identify the media type. It could try to ask all 3000+ installed filters if
> they support a particular "ogg codecs" interface, then when they find one,
> ask them if they know this codec, it would work but this is not practical.
> This is why the media type GUID is important as it narrows the number of
> filters it has to try to a much smaller number.

ASKING all the codecs if they support an Ogg interface may be 
unnecessarily burdensome, but there's no reason they can't put their 
GUID in a list in the registry somewhere, specifically for codecs that 
can be embedded in Ogg.

As for querying each one if it knows a particular codec, while this 
sounds inefficient on the face of it, a simple MRU scheme will cut out 
most of the overhead in practice. The number of codecs in general use is 
small, and the number embeddable in Ogg even smaller. And I give an even 
better suggestion below.

And finally, as Aaron said, that GUID DOESN'T need to refer to the same 
codec .dll that does the full decoding. A much smaller one that does 
only the minimal amount needed by the demuxer is possible, to reduce 
loading time.

> So the other alternative is to have "out of directshow" calls to dll's. In
> other words require codecs to provide not only directshow filter and pin
> interfaces, but also some other api external to directshow. Which again is
> possible and loks liek the route i will end up going, but it puts another
> requirement on codec developers to implement another API which is not really
> part of directshow, which kind of breaks the whole idea of an automatically
> buildable filter graph in directshow.

This is precisely what I was suggesting. I don't see how this breaks 
automatically buildable filter graphs. You can automatically query 
codecs to see if they support this extended API (or even better, have 
them register somewhere like I suggested above).

> And it also make integrating existing codecs which know nothing of ogg a
> problem as they are lready implemented, they already know how to work the

I don't see how you're going to get around this problem. If a codec 
knows nothing about Ogg, then there is no Ogg-layer mapping defined for 
it, and it can't be embedded in Ogg.

> The solution that every other format i can find that is implemented on
> directshow has chosen, is to make a blind mapping of some identifier to a
> guid... be it a numeric identifier or a fourcc code. Basicly what is done is

This doesn't really solve the problem either... someone still has to 
have defined what identifier is used for a particular codec in a 
particular container. The FOURCC route, while it looks simple on the 
surface, is inadequate. For example, consider MJPEG embedded in an AVI. 
There are currently two decoder filters available which use the same 
FOURCC, but exactly reverse the order of any interlaced fields (there 
are APP1 markers stored in the JPEGs to indicate which field is which, 
but these two codecs swap the interpretation of what those markers 
mean). How do you know which decoder should be used for a file? Looking 
at the FOURCC alone, you can't. I won't even go into all the subtle 
differences between the various MPEG4 codecs, which all have different 
bugs, different interpretations of the spec, which change from version 
to version of the same encoder, and all of which use each other's FOURCC 
codes all the time.

All current Ogg codecs DO contain a codec indentifier, but the location 
and length of the identifier varies from codec to codec. It's a simple 
matter to extend the registry list described above with an identifier 
string, offset, and length to narrow down the codecs queried even more. 
Simply pass the initial packet to every codec which passes this simple 
memcmp() test. It's even future proof against codecs which can't be 
identified in such a simple manner, since they can always register a 
length-0 identifier (thus forcing the muxer to always query that codec). 
This scheme is efficient where efficiency is possible, and flexible 
where it is not.

As for the fixed header formats... imposing those leads to unnecessary 
restrictions on the codecs embedded in the container. Look at all the 
problems AVIs have with B frames, their inability to store variable 
frame-rate video, or even variable bit-rate audio, etc.

"For every complex problem there is an answer that is clear, simple, and 

> Maybe the terminology is not appropriate but the principle is the same...
> the rate at which some unit of data is presented. Be it frames, samples or
> other. It doesn't necessarily mean a strict multiplication. Maybe data rate
> or sample rate is more appropriate.

The problem there is that this may not even be constant: Theora happens 
to have a fixed-frame rate, but that was a design decision chosen for 
simplicity, it is NOT something required of a video codec embedded in Ogg.

You can use the word "time base", i.e., the units time is measured in. 
Time is often measured in frames or smaples for fixed-rate codecs, but 
even if a codec has a variable frame rate or sample rate, it is 
reasonable to expect that it measures durations in terms of some 
constant unit (milliseconds or 100s of nanosends, for example).

> It's not a matter of DS not allowing you to use another another
> timescheme... you can to some extent... in fact the time stamps that emerge
> from the demux are currently granule pos's. Which means that any decoder
> that conencts to them must understand granule pos (which is not realyl
> desirable) But once they leave the decoder to the renderer (or any other
> ogg-unaware filter) or before they are passed to any internal decoder (the
> actaul decoder not the filter wrapper) that doesn't know about ogg they need
> to be UNITS.

I just meant it was unfortunate in the sense that, for example in a 
video editor, you often want to do something like, "seek to frame 
32145," and having to translate that into a time in units that don't 
evenly divide the frame rate (or time base with a variable frame rate) 
means you have to be careful.
--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'theora-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.

More information about the Theora-dev mailing list