[theora-dev] My issues with ogg and directshow...

Mon May 10 21:43:07 PDT 2004

----- Original Message -----
From: "Timothy B. Terriberry" <tterribe at vt.edu>
To: <theora-dev at xiph.org>
Sent: Tuesday, May 11, 2004 11:07 AM
Subject: Re: [theora-dev] My issues with ogg and directshow...

<p>> This got long, as new mail kept coming in as I was writing it. It
> replies to all of it simultaneously. Hopefully it remains coherent.
>
>
>
> > As i mentioned in the email reply a few minutes ago, this is all fine if
you
> > accept that every time you have a new codec, you need a new helper
library.
>
> And it's easy to dynamically load modules to do this, so everything
> still works "automatically".
>
> > It means that if you have an older version of the demux and it doesn't
> > recognise the header, you basically have nothing. You have no way to
know if
> > its a damaged/invalid stream or if you just don't know how to parse it.
>
> But if you don't know what it is, you couldn't play it anyway. All I get
> out of Windows Media Player for damaged streams is, "I don't know what
> format this is," so you really can't make the distinction even WITH a
> fixed identifier and header format.
>

You've never seen media player go "Contacting codec server" ? It is
possible, just not everyone does it, and microsoft doesn't supply all
possible codecs which is a problem hence the "Error downloading codec"
message.

> > However if you have a GUID (globally unique identifer), you can
> > automagically download it, install it and then you can play it.
>
> But you have the initial header, which is "a single, small 'initial
> header' packet that includes sufficient information to identify the
> exact CODEC type and media requirements of the logical bitstream"... in
> other words, instead of sending the GUID to the server you're going to
> automagically download the codec implementation from, send the whole
> header packet. Because of the "media requirements" part it's a little
> bit more data, but only a few more bytes.
>

Yes... i agree this is a reasonable compromise... and one that is also
suitable for all platforms.

<p>> > At least if the header was null terminated or fixed size, this is not an
> > issue andi don't see how that restriction imposes any great issues. Or
in
>
> "null terminated or fixed sized" doesn't solve anything that "specifying
> the length and offset per-codec" doesn't already solve.
>

It does because if you don't know the codec, how does it know where and how
long to look. But it seems we do have some agreement on a reasonable
compromise to this, so no use dwelling on this point.

> > the reverse i don't see what advantage you get by having arbitrary
> > identifiers. The whole purpose of the identifier is to identify, ideally
>
> Because those identifiers already exist and are in use.
>

In the current spec yes... that doesn't mean they must be forever... but as
above... there is a reasonable solution to this.

> > uniquely. So why not enforce it rather than rely on people to
*hopefully*
> > create identifiers that don't cause conflicts.
>
> Well, the spec mandates that it be sufficient to identify, "the exact
> codec type." If it causes conflicts, it's not obeying the spec.
>

That's easier said than done though, and it's not all that specific... i
agree it works most of the time, but relying on people to do this isn't
necessarily the best policy.

>
>
>
>  > The muxer must know all this, the demux doesn't necessarily... it
> just needs
>  > answers to a few simple questions. Which the muxer can offer in a
> standard
>  > way, meaning the demuxer doesn't have to figure it out for itself
> every time
>  > the file is played.
>
> The questions are simple, but deriving the ANSWERS may not be. In this
> case "standard" is just a synonym for "constrained". What happens when I
>   want to stick something that's not audio, not video, not still images,
> not MIDI, not text into an Ogg container? You would have me wait to
> define a new "standard" that can process whatever it is, instead of

No you just use the generic format. I'm not saying it should all be
constrained, just that it would make it easier for all platforms in terms of
extensibility to have some consensus on specific cases. For example, ogm
again is not really a standard, but it's format is a standard way to include
just about all the major types of codec in a standard way... in cases where
it is too constrained, just don't use it.

> defining a demuxer flexible enough to handle it regardless, possibly
> mapping it into one of the types your framework is capable of handline,
> but retaining the flexibility to use a completely different mapping in
> anothe framework.
>
> For example, consider GPS data embedded in an Ogg stream. In the
> DirectShow framework, this could potentially be mapped to an overlay
> that just displays the lat/lon on top of the video, but a digital
> library could interpret it as what it actually is, and use it for
indexing.
>

There's always special cases, i'm not advocating getting rid of the varaible
codec specific header, because it is obviously useful for such cases, but
they are by and large the exception and not the rule.

>  > True which is what will end up happening. But seeing as the codec
> identifier
>  > is of variable length, in the case where one id is prefixed by the
> other it
>  > depends on which one is checked first.
>
> Such a case doesn't yet exist. My suggestion would of course be, "don't
> do that." But even if you did... using longest match first as a
> tie-breaker is just as good as explicit null termination for any
> practical case.
>

Not always... what if for some reason the shorter identifier's next bytes
which are not part of the identifier happen to coincide with bytes that are
? For example if there is a \001vorbis and a \001vorbis2... the tie breaker
is all good until vorbis's (\001vorbis) version number becomes 2, then it is
indistinguishable.

Granted these are rare cases and ideally, codec developers should try and
use unique names... but it's hardly watertight, and a fixed length or
terminated ident would enforce that. I'm just saying it might be something
to think about for future revisions.

>  > In my previous example if one id is \001vorbis and another is
\001vorbis2
>  > ... if the first is checked first it will incorrectly identify the
> second,
>
> The initial header must contain sufficient information to, "identify the
> exact CODEC type." If the second example gets mis-identified as Vorbis,
> then it is not obeying this restriction. What would ACTUALLY happen is
> that after passing the simple identifier check, the Vorbis I header
> parser would then continue on verifying version numbers, and THAT would
> fail for the second example, so it would not pose any serious problems.
> All you will have done is waste some small, insignificant processing time.
>

Not necessarily true... as i say these are rare cases, but worth thinking
about... it's entirely possible for a different header to still match some
of those parameters and even be the same length.

<p>>  > Also, only the muxer needs to know this. As is evidenced by the fact
> that i
>  > imlpemented all ffdshow video codecs with a single ogm header. If
> they add
>  > 10 new codecs tomorrow, my demux won't change and nor will i need
helper
>  > libraries to identify the codecs. My demux has no idea whats in a divx
>
> PROVIDED they obey the restrictions imposed by the OGM header. The point
> is that Ogg tries to minimize the restrictions it imposes, while you are
> trying to force the restrictions of your media framework down into the
> file format, and thus preserve them for all time.

No, no-one is forced to use the ogm header. But it may be an idea to
document it's structure as people are going to use it regardless. The ogm
header is no different to any other codecs header.

>  > How would that be any different than if i created theora files that
> produce
>  > upside down video but still identify themselves as theora. This is a
> generic
>  > problem.
>
> Because we have the entire initial header packet, you can do more
> complex checking than just against a simple FOURCC. Though I will admit
> that because we didn't bump the version number during the alphas, you'd
> actually have to look in the vendor string, which is in the comment
> header, not the initial packet, to tell.
>

We've got some consesnsus on a remote identification system, so ignoring
that part... but if you are going to have such a system that identifies
based on the initial packet, then at least that must be enforced. As for the
alphas, no big deal... but once in real use, it really needs to be enforced
that identification must be possbile from the identification header.

>  > Ideally specs shouldn't be open to that kind of interpretation. And
where
>  > they are they should be modified to bring everything back to
> alignment. Bugs
>  > on the other hand are always going to be a fact of life.
>
> The MPEG4 spec has been amended in places, thus the, "bugs changing from
> version to version of the same encoder."
>
>  > But the fixed header is not there to replace the variable header,
> merely to
>  > supplement it.
>
> Duplicating information in a file format in multiple ways in the hopes
> that a decoder will be able to understand at least one of them is a good
> way to shoot yourself in the foot. What happens when the fixed header
> says one thing, and the codec-specific header says something else

That's a bug ! That's just as likely to happen as the wrong data being put
in the native header.

> entirely? The anwser is your simple demuxer does the wrong thing. Some
> MPEG4 encoders set aspect ratios in the MPEG headers, but not in the AVI
> header... the result is they get ignored, and your video is displayed
> with the wrong aspect. The reality is I often see tons of MPEG-4 or
> mp3-specific code in what are otherwise supposed to be "generic"
> demuxers and muxers.
>
>
>
>
>  > a) Bugs : Not much you can do about this except enourage people to
> rectify
>  > them
>  > b) Same codec different feeatures: User can specify their preference.
>
> Right, not that there is any "standard" way to do this. ffdshow provides
> a simple to use GUI which configures which codecs IT will handle, but
> not everyone else does.
>

But also directshow provides a merit system where codecs say how likely they
are to be the correct choice... ie renderers always get preference over
encoders to avoid the building of a graph that goes
decode->encode->decode->encode.

You are right it's not perfect. But the user can always deregeister filters
that don't play ball.

> With the simple registry list scheme, you'd just let the user order them
> by priority (somehow), and check them in that order.
>
>  > c) Same codec different outputs: Rectify the spec and invalidate the
>  > incorrect ones.
>
>  > Yes, that's a good point. But that assumes the spec is fixed for all
>  > eternity.
>
> Well, the hardware manufacturers would certainly like this to be true!
> Of course a newer version of Vorbis is planned, and it will bump the
> version number, and older decoders will properly say they can't support
> the newer files. Modulo bugs, you can assume the Vorbis I spec is fixed.
>

Well, we'd all love for everything to be perfect first time round... in
reality it's rarely the case. All i'm saying is if the spec is being upped
in such a way to make breaking changes, why not use the opportunity to make
other changes that have arisen since the original spec which couldn't be
done becuase they would break exisitnig implementations.

<p>>  > I was not suggesting just go and change it, i was more trying to
> understand
>  > what the rationale was in the first place !
>
> Well, as for the variable-length identifiers, most likely the rationale
> was, "We need to do something; the spec doesn't mandate anything; this
> seems good enough." In some ways this is good: if everyone follows a
> convention that is not mandated in the spec, then people will start to
> assume the convention IS mandated (or simply won't care), and everyone's
> code will break the first time someone tries to do something different,
> even though that something different may be "the right thing" for their
> particular situation.
>

But that is how interpretations come about... because people will not be
breaking the spec, but it won't work on a supposedly conforming
implementation.

> Take embedding non-Xiph codecs in Ogg, for example, especially
> pre-existing ones. None of _them_ will have a 6-byte (or 7 if you
> include the packet type marker) identifier in their native headers. If
> all the Xiph codecs had mimiced Vorbis so exactly, people would surely
> be relying on this, and have to do major re-design work to incorporate
> the new flexibility required to support these other codecs. You're being
> forced to do this design work up front, hopefully so that it only needs
> to be done once.
>

But still something has to be done to use these codecs... and the most valid
choice is currently ogm. Otherwise everyone is going to have their own way
to put mpeg into ogg and that's how you end up with tons of files which to
the letter of the spec are conformant (they all create their own magic
header), but differing implementations either will/won't or partially play
them.

>  >
>  > I can't think of any audio codec that doesn't have a pcm sample rate
> and a
>  > number of channels ?
>
> Sure you can't. But what happens when I want to make an audio codec that
> uses a variable sample rate to account for drift in capture cards
> without requiring resampling by the encoder (since capture must be done
> in real time)?
>

Then you use the generic header format... the two concepts aren't mutally
exclusive.

>  > In specific cases... ie audio and video (which are also the most common
>  > cases), why not be specific, in other cases be generic. If an audio
> or video
>  > codec comes along that makes using the specific codec impossible, then
it
>  > will use the generic one.
>
> And everyone who's written software that only supports the specific will
> be unable to use your generic!  By forcing implementors to be generic up
> front, you ensure that they can handle it when the time comes.
>

If the generic is part of teh spec... then no-one has an excuse not to
imlpement the generic as well. As i said it's not mutually exclusive.

>  > I think that's a bit of a harsh judgement ! I'm not trying to impose or
>  > force anyone to do anything, i 'm just putting out there the particular
>  > problems that exist under the particular framework that i'm
implementing
>  > under, in the hope that some of these issues will at least allow future
>  > decisions to be made with as much information abuot issues specific
> to this
>  > platform as possible, and that hopefully issues that relate to this
> platform
>  > get at least some weight, rather than just issues that occur under the
>  > majority of developers preferred platform.
>
> Well, automatic codec identification isn't really a problem specific to
> your platform, though previous solutions certainly have been. Pretty
> much everything I've suggested has been suggested by someone else
> working for another platform.
>
> The seeking problems you've come up with are at least similar to those
> on other platforms. The communication restrictions DirectShow imposes do
> make your situation somewhat unique, and that should be taken into
> careful consideration when finalizing the mux spec.
>
>  > If the decisions are made that don't suit my platform i'll just
> continue to
>  > hack around them as i have been.
>
> That's usually the way things work. Our goal should be that there is at
> least one reasonably "correct" way to do things. Sometimes that way
> might just be, "This media framework does not support the features used
> by this file. I give up." I'll admit I was kind of expecting that with
> relation to chained bitstreams, and am more than pleasantly surprised
> there's actually a way to hack up some kind of support for them.
>

But why make the "I give up" choice even a possibiltiy when you can factor
that into the initial decision making process where possible. In order for
anything to gain popularity and widespread use across multiple platforms
compromises to various things will always have to be made to accomodate the
conventions and quirks of that platform.

You can take the "well we don't care about that platform if they can't do it
our way" approach, but that does the cause no good at all. And in the case
where the devlopers are primarily from one platform, then inevitably more
concessions will be made to that platform than others. Whether that's a good
long term policy is another matter. Because the reality is, if these codecs
do gain a widespread use a great majority of end users will be using
windows. So the way i see it, who are you developing for if not the end
users.

Even though 80% of current developers may be developingfor Unix based
platforms, if they gain acceptance, which i assume is the goal of everyone
here, like it or not probably 80% of end users are going to be using
windows.

Now that's not me trying to convince anyone that you should all do it the
windows way, far from it. Just something to think about.

>
>
>  > If xiph wants to set up such a system on their server such that i can
> send
>  > it a packet and it will tell me where to download and install a codec
> from,
>  > it's probably possible with no intervention from microsoft. But that
> forces
>  > the issue of remote intervention rather than making it a last resort.
>
> And the beauty of open source is that anyone could then set up such a
> server, unlike the current situation with Microsoft. Xiph _would_ be
> unable to distribute patent-encumbered codecs (and I have no idea about
> the legality of even linking to ones hosted remotely), though. But if
> you want to set up your own server in a software-patent free region,
> you'd be able to.
>

Problem is, how does the demux know which servers to contact ? That is the
benefit of a single point of contact. Unless you embed the host name where
the codec is in the header in some specific way, you have no way to know who
you should be asking. But that's back to the original problem.

<p>Zen.

<p>--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'theora-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.