[theora-dev] My issues with ogg and directshow...

Ralph Giles giles at xiph.org
Mon May 10 09:21:58 PDT 2004



On Sun, May 09, 2004 at 03:14:37AM +0800, illiminable wrote:

> Listening to the meeting on granule pos tonight/today it became clear that
> the issues everyone is concerned with for the most part don't affect my
> implementations and the issues i have pretty much don't affect anyone
> else... and in the cases where they overlap, the reasoning seems to be
> different. And since everyone else has had a lot more time to consider all
> these issues and i'm pretty new to this, it's a lot harder for me to make a
> cogent argument on the fly. So i figure i'd spell out all the things i've
> come across in my implementation, just to put them out there.

Thanks for putting this together, Zen. It's really nice to have a solid
introduction to the issues from someone experienced with the framework.

> Allocator pools exist between the connection of any two pins. An allocator
> pool is a fixed number of fixed size samples.

I can see how this works fixed-bitrate codecs (and most uncompressed media, of
course). Does one just use 'really big buffers' for vbr data?

> Directshow requires start and end times for all samples.

And you've succeeded in calculating this for all our codecs?

> Ok, so given that the graph has to be built before data is passed
> downstream, there is a problem. How can the demuxer know what filters to
> connect to (ie what the streams are) ? The demux needs to read ahead enough
> to find the BOS pages. Now we know how many streams there are. How does it
> know what kind of streams they are ? It has to be able to recognise the
> capture patterns of every possible codec. So a "codec oblivious" demux is
> already out of the question.
> 
> Lets look further downstream for the moment... we'll assume we have a vorbis
> only stream. Now the directsound audio renderer won't connect to any decoder
> unless it tells it the audio parameters, number of channels, sample rate etc
> etc. Now if no data can flow in the graph yet, how can the decoder have seen
> the header pages to know this ? It can't. This information is considered
> part of the setup data. Hence the media parameters have to come from the
> demux when it connected to the decoder, ie the media type the demux offers
> is (Audio/Vorbis 2 channel 44100) for example.
> 
> So the demux has to be able to parse the BOS page headers to offer a useful
> media type. So now the demux has to be able to not only identify the streams
> but also know how to get at least the key information out of them. ie The
> demux has to know how to parse the header of every possible codec header
> format it will offer.
> 
> Now, why isn't this an issue with every other codec i assume you are
> thinking ?

To clarify here, it's my understanding that format parameter lookup is a
feature of the AVI and ogm container formats (and asf, presumedly) not of
any of the specific codecs. Is this correct?

That's why lookup of this information is always possible there, and not 
for ogg, even if we provide a convenience library that can do the header
parse for all the codec embeddings it knows about, as I think derf was
suggesting.

Practically speaking, I think this can be dealt with. After all, being able
to identify a codec by FOURCC doesn't help if you can't find an implementing
dll. From the point of view of DirectShow, it's just a limitation of this 
particular container format.

Not knowing anything about them, I'd guess that quicktime can optionally
provide a table with this information, and that MPEG program streams, like
ogg, don't provide much beyond the packet types. How does DirectShow handle
those containers?

> The related issue is that of identifying streams... the codec identifier has
> no bounds, there is no way to say this is the end of teh identifier, and
> this is the rest of the header. In other words \001vorbis is pretty much
> indistinguishable to \001vorbis2. How can you tell if the 2 is part fo the
> identifier or the rest of teh header ?

Yes. It's well defined in specific codec specs, but more flexible in general.
Just looking file-magic style at some of the initial bytes should always
work.

> Using the start stamp scheme we can resync as we hit a page. As we get a
> page we know what time this page starts at.and we then have a reference
> point to determine start and end times of every subsequent sample in that
> stream. this means less seek back.

This is another good example of problems with the end-time granule. Thanks.

> As for stream duration, i see no problem with having an empty EOS page which
> has the end time in it.

The only problem here is that you can't rely on the page being there (the stream
might be truncated, and in fact my explicitly be so in Ogg Vorbis). So it's 
sugar, not something that's 'built-in' to the format design.

> But from the sounds of it, this isn't the general consensus.

Dunno. Sounded like Aaron was on your side. :)

Cheers,
 -r
--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'theora-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.



More information about the Theora-dev mailing list