[theora-dev] My issues with ogg and directshow...

Mon May 10 20:07:35 PDT 2004

This got long, as new mail kept coming in as I was writing it. It 
replies to all of it simultaneously. Hopefully it remains coherent.

<p><p>> As i mentioned in the email reply a few minutes ago, this is all fine if you
> accept that every time you have a new codec, you need a new helper library.

And it's easy to dynamically load modules to do this, so everything 
still works "automatically".

> It means that if you have an older version of the demux and it doesn't
> recognise the header, you basically have nothing. You have no way to know if
> its a damaged/invalid stream or if you just don't know how to parse it.

But if you don't know what it is, you couldn't play it anyway. All I get 
out of Windows Media Player for damaged streams is, "I don't know what 
format this is," so you really can't make the distinction even WITH a 
fixed identifier and header format.

> However if you have a GUID (globally unique identifer), you can
> automagically download it, install it and then you can play it.

But you have the initial header, which is "a single, small 'initial 
header' packet that includes sufficient information to identify the 
exact CODEC type and media requirements of the logical bitstream"... in 
other words, instead of sending the GUID to the server you're going to 
automagically download the codec implementation from, send the whole 
header packet. Because of the "media requirements" part it's a little 
bit more data, but only a few more bytes.

> At least if the header was null terminated or fixed size, this is not an
> issue andi don't see how that restriction imposes any great issues. Or in

"null terminated or fixed sized" doesn't solve anything that "specifying 
the length and offset per-codec" doesn't already solve.

> the reverse i don't see what advantage you get by having arbitrary
> identifiers. The whole purpose of the identifier is to identify, ideally

Because those identifiers already exist and are in use.

> uniquely. So why not enforce it rather than rely on people to *hopefully*
> create identifiers that don't cause conflicts.

Well, the spec mandates that it be sufficient to identify, "the exact 
codec type." If it causes conflicts, it's not obeying the spec.

<p><p><p> > The muxer must know all this, the demux doesn't necessarily... it 
just needs
 > answers to a few simple questions. Which the muxer can offer in a 
standard
 > way, meaning the demuxer doesn't have to figure it out for itself 
every time
 > the file is played.

The questions are simple, but deriving the ANSWERS may not be. In this 
case "standard" is just a synonym for "constrained". What happens when I 
  want to stick something that's not audio, not video, not still images, 
not MIDI, not text into an Ogg container? You would have me wait to 
define a new "standard" that can process whatever it is, instead of 
defining a demuxer flexible enough to handle it regardless, possibly 
mapping it into one of the types your framework is capable of handline, 
but retaining the flexibility to use a completely different mapping in 
anothe framework.

For example, consider GPS data embedded in an Ogg stream. In the 
DirectShow framework, this could potentially be mapped to an overlay 
that just displays the lat/lon on top of the video, but a digital 
library could interpret it as what it actually is, and use it for indexing.

 > True which is what will end up happening. But seeing as the codec 
identifier
 > is of variable length, in the case where one id is prefixed by the 
other it
 > depends on which one is checked first.

Such a case doesn't yet exist. My suggestion would of course be, "don't 
do that." But even if you did... using longest match first as a 
tie-breaker is just as good as explicit null termination for any 
practical case.

 > In my previous example if one id is \001vorbis and another is \001vorbis2
 > ... if the first is checked first it will incorrectly identify the 
second,

The initial header must contain sufficient information to, "identify the 
exact CODEC type." If the second example gets mis-identified as Vorbis, 
then it is not obeying this restriction. What would ACTUALLY happen is 
that after passing the simple identifier check, the Vorbis I header 
parser would then continue on verifying version numbers, and THAT would 
fail for the second example, so it would not pose any serious problems. 
All you will have done is waste some small, insignificant processing time.

 > Also, only the muxer needs to know this. As is evidenced by the fact 
that i
 > imlpemented all ffdshow video codecs with a single ogm header. If 
they add
 > 10 new codecs tomorrow, my demux won't change and nor will i need helper
 > libraries to identify the codecs. My demux has no idea whats in a divx

PROVIDED they obey the restrictions imposed by the OGM header. The point 
is that Ogg tries to minimize the restrictions it imposes, while you are 
trying to force the restrictions of your media framework down into the 
file format, and thus preserve them for all time.

 > How would that be any different than if i created theora files that 
produce
 > upside down video but still identify themselves as theora. This is a 
generic
 > problem.

Because we have the entire initial header packet, you can do more 
complex checking than just against a simple FOURCC. Though I will admit 
that because we didn't bump the version number during the alphas, you'd 
actually have to look in the vendor string, which is in the comment 
header, not the initial packet, to tell.

 > Ideally specs shouldn't be open to that kind of interpretation. And where
 > they are they should be modified to bring everything back to 
alignment. Bugs
 > on the other hand are always going to be a fact of life.

The MPEG4 spec has been amended in places, thus the, "bugs changing from 
version to version of the same encoder."

 > But the fixed header is not there to replace the variable header, 
merely to
 > supplement it.

Duplicating information in a file format in multiple ways in the hopes 
that a decoder will be able to understand at least one of them is a good 
way to shoot yourself in the foot. What happens when the fixed header 
says one thing, and the codec-specific header says something else 
entirely? The anwser is your simple demuxer does the wrong thing. Some 
MPEG4 encoders set aspect ratios in the MPEG headers, but not in the AVI 
header... the result is they get ignored, and your video is displayed 
with the wrong aspect. The reality is I often see tons of MPEG-4 or 
mp3-specific code in what are otherwise supposed to be "generic" 
demuxers and muxers.

<p><p><p> > a) Bugs : Not much you can do about this except enourage people to 
rectify
 > them
 > b) Same codec different feeatures: User can specify their preference.

Right, not that there is any "standard" way to do this. ffdshow provides 
a simple to use GUI which configures which codecs IT will handle, but 
not everyone else does.

With the simple registry list scheme, you'd just let the user order them 
by priority (somehow), and check them in that order.

 > c) Same codec different outputs: Rectify the spec and invalidate the
 > incorrect ones.

 > Yes, that's a good point. But that assumes the spec is fixed for all
 > eternity.

Well, the hardware manufacturers would certainly like this to be true! 
Of course a newer version of Vorbis is planned, and it will bump the 
version number, and older decoders will properly say they can't support 
the newer files. Modulo bugs, you can assume the Vorbis I spec is fixed.

 > I was not suggesting just go and change it, i was more trying to 
understand
 > what the rationale was in the first place !

Well, as for the variable-length identifiers, most likely the rationale 
was, "We need to do something; the spec doesn't mandate anything; this 
seems good enough." In some ways this is good: if everyone follows a 
convention that is not mandated in the spec, then people will start to 
assume the convention IS mandated (or simply won't care), and everyone's 
code will break the first time someone tries to do something different, 
even though that something different may be "the right thing" for their 
particular situation.

Take embedding non-Xiph codecs in Ogg, for example, especially 
pre-existing ones. None of _them_ will have a 6-byte (or 7 if you 
include the packet type marker) identifier in their native headers. If 
all the Xiph codecs had mimiced Vorbis so exactly, people would surely 
be relying on this, and have to do major re-design work to incorporate 
the new flexibility required to support these other codecs. You're being 
forced to do this design work up front, hopefully so that it only needs 
to be done once.

 > As i mentioned... a tie goes to the longest knwon to the demuxer, not
 > necessarily the longest that may be in a muxed file created some time 
later.

Right, and once that demuxer starts checking version numbers and such, 
it should be able to positively identify whether it can actually decode 
the stream. The fixed identifier is a first-pass check, not the final one.

 > Exactly my point, the mux only needs to know, the mux is the more comlpex
 > component, and the mux only occurs once.

If you go open up the libavcodec example cod for encoding, and then for 
playing, I'll argue which one is more complex...
...and the mux may only occur once for a particular file, but that 
doesn't stop people from creating lots of muxer implementations.

 > I don't see how it's maximum flexibility, if all files were done this way
 > and part of an ogg header format, then all frameworks can utilise it.

I'll say it again, because this is important.

"This only forces a codec to conform to the restrictions of a given 
media framework when being used by that framework; it does NOT force it 
to obey those restrictions just to be stored in an Ogg container---and 
thus across ALL media frameworks, regardless of their capabilities."

 >
 > I can't think of any audio codec that doesn't have a pcm sample rate 
and a
 > number of channels ?

Sure you can't. But what happens when I want to make an audio codec that 
uses a variable sample rate to account for drift in capture cards 
without requiring resampling by the encoder (since capture must be done 
in real time)?

I'm sure the designers of AVI couldn't think of any audio codec that 
didn't have a constant bitrate, either.

 > Nor any video codec that doesn't have an initial frame size ?

Right, just like the AVI designers had never heard of a video codec with 
a variable frame rate, and the VfW designers had never heard of a video 
encoder that needed to buffer multiple input frames before producing an 
encoded output frame (e.g., for B frames)... and then stored the 
resulting output frames out of order in the bitstream. That's crazy talk.

But even just for your frame size example... what happens when the 
chroma planes are subsampled? Do you force size of the luma plane to be 
even? Theora does not. Some frameworks do. Even for those that do not... 
Theora aligns chroma samples with luma samples based on their position 
in the uncropped frame, NOT the final cropped picture. So sometimes the 
first row of chroma samples in a picture correspond to one row of luma 
samples, and sometimes two, depending on whether or not the picture 
offset is even or odd. Do you know of a media framework that lets the 
codec specify THAT correctly?

 > In specific cases... ie audio and video (which are also the most common
 > cases), why not be specific, in other cases be generic. If an audio 
or video
 > codec comes along that makes using the specific codec impossible, then it
 > will use the generic one.

And everyone who's written software that only supports the specific will 
be unable to use your generic!  By forcing implementors to be generic up 
front, you ensure that they can handle it when the time comes.

 > I think that's a bit of a harsh judgement ! I'm not trying to impose or
 > force anyone to do anything, i 'm just putting out there the particular
 > problems that exist under the particular framework that i'm implementing
 > under, in the hope that some of these issues will at least allow future
 > decisions to be made with as much information abuot issues specific 
to this
 > platform as possible, and that hopefully issues that relate to this 
platform
 > get at least some weight, rather than just issues that occur under the
 > majority of developers preferred platform.

Well, automatic codec identification isn't really a problem specific to 
your platform, though previous solutions certainly have been. Pretty 
much everything I've suggested has been suggested by someone else 
working for another platform.

The seeking problems you've come up with are at least similar to those 
on other platforms. The communication restrictions DirectShow imposes do 
make your situation somewhat unique, and that should be taken into 
careful consideration when finalizing the mux spec.

 > If the decisions are made that don't suit my platform i'll just 
continue to
 > hack around them as i have been.

That's usually the way things work. Our goal should be that there is at 
least one reasonably "correct" way to do things. Sometimes that way 
might just be, "This media framework does not support the features used 
by this file. I give up." I'll admit I was kind of expecting that with 
relation to chained bitstreams, and am more than pleasantly surprised 
there's actually a way to hack up some kind of support for them.

<p><p> > If xiph wants to set up such a system on their server such that i can 
send
 > it a packet and it will tell me where to download and install a codec 
from,
 > it's probably possible with no intervention from microsoft. But that 
forces
 > the issue of remote intervention rather than making it a last resort.

And the beauty of open source is that anyone could then set up such a 
server, unlike the current situation with Microsoft. Xiph _would_ be 
unable to distribute patent-encumbered codecs (and I have no idea about 
the legality of even linking to ones hosted remotely), though. But if 
you want to set up your own server in a software-patent free region, 
you'd be able to.

 > With a GUID, you can do it locally if possible. Lets say someone is 
mucking
 > around with new format that's not complete. With a guid mapping they can
 > happily let it work on their local system, without changing the demuxer.
 > With remote identification the demuxer won't recognise it, and the xiph
 > server won't have heard of it.

And defining a standard API beyond the DirectShow one they can implement 
and register their .dll (or COM object or whatever) with lets them do 
this as well.
--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'theora-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.