[theora-dev] My issues with ogg and directshow...
Timothy B. Terriberry
tterribe at vt.edu
Mon May 10 20:07:35 PDT 2004
This got long, as new mail kept coming in as I was writing it. It
replies to all of it simultaneously. Hopefully it remains coherent.
<p><p>> As i mentioned in the email reply a few minutes ago, this is all fine if you
> accept that every time you have a new codec, you need a new helper library.
And it's easy to dynamically load modules to do this, so everything
still works "automatically".
> It means that if you have an older version of the demux and it doesn't
> recognise the header, you basically have nothing. You have no way to know if
> its a damaged/invalid stream or if you just don't know how to parse it.
But if you don't know what it is, you couldn't play it anyway. All I get
out of Windows Media Player for damaged streams is, "I don't know what
format this is," so you really can't make the distinction even WITH a
fixed identifier and header format.
> However if you have a GUID (globally unique identifer), you can
> automagically download it, install it and then you can play it.
But you have the initial header, which is "a single, small 'initial
header' packet that includes sufficient information to identify the
exact CODEC type and media requirements of the logical bitstream"... in
other words, instead of sending the GUID to the server you're going to
automagically download the codec implementation from, send the whole
header packet. Because of the "media requirements" part it's a little
bit more data, but only a few more bytes.
> At least if the header was null terminated or fixed size, this is not an
> issue andi don't see how that restriction imposes any great issues. Or in
"null terminated or fixed sized" doesn't solve anything that "specifying
the length and offset per-codec" doesn't already solve.
> the reverse i don't see what advantage you get by having arbitrary
> identifiers. The whole purpose of the identifier is to identify, ideally
Because those identifiers already exist and are in use.
> uniquely. So why not enforce it rather than rely on people to *hopefully*
> create identifiers that don't cause conflicts.
Well, the spec mandates that it be sufficient to identify, "the exact
codec type." If it causes conflicts, it's not obeying the spec.
<p><p><p> > The muxer must know all this, the demux doesn't necessarily... it
just needs
> answers to a few simple questions. Which the muxer can offer in a
standard
> way, meaning the demuxer doesn't have to figure it out for itself
every time
> the file is played.
The questions are simple, but deriving the ANSWERS may not be. In this
case "standard" is just a synonym for "constrained". What happens when I
want to stick something that's not audio, not video, not still images,
not MIDI, not text into an Ogg container? You would have me wait to
define a new "standard" that can process whatever it is, instead of
defining a demuxer flexible enough to handle it regardless, possibly
mapping it into one of the types your framework is capable of handline,
but retaining the flexibility to use a completely different mapping in
anothe framework.
For example, consider GPS data embedded in an Ogg stream. In the
DirectShow framework, this could potentially be mapped to an overlay
that just displays the lat/lon on top of the video, but a digital
library could interpret it as what it actually is, and use it for indexing.
> True which is what will end up happening. But seeing as the codec
identifier
> is of variable length, in the case where one id is prefixed by the
other it
> depends on which one is checked first.
Such a case doesn't yet exist. My suggestion would of course be, "don't
do that." But even if you did... using longest match first as a
tie-breaker is just as good as explicit null termination for any
practical case.
> In my previous example if one id is \001vorbis and another is \001vorbis2
> ... if the first is checked first it will incorrectly identify the
second,
The initial header must contain sufficient information to, "identify the
exact CODEC type." If the second example gets mis-identified as Vorbis,
then it is not obeying this restriction. What would ACTUALLY happen is
that after passing the simple identifier check, the Vorbis I header
parser would then continue on verifying version numbers, and THAT would
fail for the second example, so it would not pose any serious problems.
All you will have done is waste some small, insignificant processing time.
> Also, only the muxer needs to know this. As is evidenced by the fact
that i
> imlpemented all ffdshow video codecs with a single ogm header. If
they add
> 10 new codecs tomorrow, my demux won't change and nor will i need helper
> libraries to identify the codecs. My demux has no idea whats in a divx
PROVIDED they obey the restrictions imposed by the OGM header. The point
is that Ogg tries to minimize the restrictions it imposes, while you are
trying to force the restrictions of your media framework down into the
file format, and thus preserve them for all time.
> How would that be any different than if i created theora files that
produce
> upside down video but still identify themselves as theora. This is a
generic
> problem.
Because we have the entire initial header packet, you can do more
complex checking than just against a simple FOURCC. Though I will admit
that because we didn't bump the version number during the alphas, you'd
actually have to look in the vendor string, which is in the comment
header, not the initial packet, to tell.
> Ideally specs shouldn't be open to that kind of interpretation. And where
> they are they should be modified to bring everything back to
alignment. Bugs
> on the other hand are always going to be a fact of life.
The MPEG4 spec has been amended in places, thus the, "bugs changing from
version to version of the same encoder."
> But the fixed header is not there to replace the variable header,
merely to
> supplement it.
Duplicating information in a file format in multiple ways in the hopes
that a decoder will be able to understand at least one of them is a good
way to shoot yourself in the foot. What happens when the fixed header
says one thing, and the codec-specific header says something else
entirely? The anwser is your simple demuxer does the wrong thing. Some
MPEG4 encoders set aspect ratios in the MPEG headers, but not in the AVI
header... the result is they get ignored, and your video is displayed
with the wrong aspect. The reality is I often see tons of MPEG-4 or
mp3-specific code in what are otherwise supposed to be "generic"
demuxers and muxers.
<p><p><p> > a) Bugs : Not much you can do about this except enourage people to
rectify
> them
> b) Same codec different feeatures: User can specify their preference.
Right, not that there is any "standard" way to do this. ffdshow provides
a simple to use GUI which configures which codecs IT will handle, but
not everyone else does.
With the simple registry list scheme, you'd just let the user order them
by priority (somehow), and check them in that order.
> c) Same codec different outputs: Rectify the spec and invalidate the
> incorrect ones.
> Yes, that's a good point. But that assumes the spec is fixed for all
> eternity.
Well, the hardware manufacturers would certainly like this to be true!
Of course a newer version of Vorbis is planned, and it will bump the
version number, and older decoders will properly say they can't support
the newer files. Modulo bugs, you can assume the Vorbis I spec is fixed.
> I was not suggesting just go and change it, i was more trying to
understand
> what the rationale was in the first place !
Well, as for the variable-length identifiers, most likely the rationale
was, "We need to do something; the spec doesn't mandate anything; this
seems good enough." In some ways this is good: if everyone follows a
convention that is not mandated in the spec, then people will start to
assume the convention IS mandated (or simply won't care), and everyone's
code will break the first time someone tries to do something different,
even though that something different may be "the right thing" for their
particular situation.
Take embedding non-Xiph codecs in Ogg, for example, especially
pre-existing ones. None of _them_ will have a 6-byte (or 7 if you
include the packet type marker) identifier in their native headers. If
all the Xiph codecs had mimiced Vorbis so exactly, people would surely
be relying on this, and have to do major re-design work to incorporate
the new flexibility required to support these other codecs. You're being
forced to do this design work up front, hopefully so that it only needs
to be done once.
> As i mentioned... a tie goes to the longest knwon to the demuxer, not
> necessarily the longest that may be in a muxed file created some time
later.
Right, and once that demuxer starts checking version numbers and such,
it should be able to positively identify whether it can actually decode
the stream. The fixed identifier is a first-pass check, not the final one.
> Exactly my point, the mux only needs to know, the mux is the more comlpex
> component, and the mux only occurs once.
If you go open up the libavcodec example cod for encoding, and then for
playing, I'll argue which one is more complex...
...and the mux may only occur once for a particular file, but that
doesn't stop people from creating lots of muxer implementations.
> I don't see how it's maximum flexibility, if all files were done this way
> and part of an ogg header format, then all frameworks can utilise it.
I'll say it again, because this is important.
"This only forces a codec to conform to the restrictions of a given
media framework when being used by that framework; it does NOT force it
to obey those restrictions just to be stored in an Ogg container---and
thus across ALL media frameworks, regardless of their capabilities."
>
> I can't think of any audio codec that doesn't have a pcm sample rate
and a
> number of channels ?
Sure you can't. But what happens when I want to make an audio codec that
uses a variable sample rate to account for drift in capture cards
without requiring resampling by the encoder (since capture must be done
in real time)?
I'm sure the designers of AVI couldn't think of any audio codec that
didn't have a constant bitrate, either.
> Nor any video codec that doesn't have an initial frame size ?
Right, just like the AVI designers had never heard of a video codec with
a variable frame rate, and the VfW designers had never heard of a video
encoder that needed to buffer multiple input frames before producing an
encoded output frame (e.g., for B frames)... and then stored the
resulting output frames out of order in the bitstream. That's crazy talk.
But even just for your frame size example... what happens when the
chroma planes are subsampled? Do you force size of the luma plane to be
even? Theora does not. Some frameworks do. Even for those that do not...
Theora aligns chroma samples with luma samples based on their position
in the uncropped frame, NOT the final cropped picture. So sometimes the
first row of chroma samples in a picture correspond to one row of luma
samples, and sometimes two, depending on whether or not the picture
offset is even or odd. Do you know of a media framework that lets the
codec specify THAT correctly?
> In specific cases... ie audio and video (which are also the most common
> cases), why not be specific, in other cases be generic. If an audio
or video
> codec comes along that makes using the specific codec impossible, then it
> will use the generic one.
And everyone who's written software that only supports the specific will
be unable to use your generic! By forcing implementors to be generic up
front, you ensure that they can handle it when the time comes.
> I think that's a bit of a harsh judgement ! I'm not trying to impose or
> force anyone to do anything, i 'm just putting out there the particular
> problems that exist under the particular framework that i'm implementing
> under, in the hope that some of these issues will at least allow future
> decisions to be made with as much information abuot issues specific
to this
> platform as possible, and that hopefully issues that relate to this
platform
> get at least some weight, rather than just issues that occur under the
> majority of developers preferred platform.
Well, automatic codec identification isn't really a problem specific to
your platform, though previous solutions certainly have been. Pretty
much everything I've suggested has been suggested by someone else
working for another platform.
The seeking problems you've come up with are at least similar to those
on other platforms. The communication restrictions DirectShow imposes do
make your situation somewhat unique, and that should be taken into
careful consideration when finalizing the mux spec.
> If the decisions are made that don't suit my platform i'll just
continue to
> hack around them as i have been.
That's usually the way things work. Our goal should be that there is at
least one reasonably "correct" way to do things. Sometimes that way
might just be, "This media framework does not support the features used
by this file. I give up." I'll admit I was kind of expecting that with
relation to chained bitstreams, and am more than pleasantly surprised
there's actually a way to hack up some kind of support for them.
<p><p> > If xiph wants to set up such a system on their server such that i can
send
> it a packet and it will tell me where to download and install a codec
from,
> it's probably possible with no intervention from microsoft. But that
forces
> the issue of remote intervention rather than making it a last resort.
And the beauty of open source is that anyone could then set up such a
server, unlike the current situation with Microsoft. Xiph _would_ be
unable to distribute patent-encumbered codecs (and I have no idea about
the legality of even linking to ones hosted remotely), though. But if
you want to set up your own server in a software-patent free region,
you'd be able to.
> With a GUID, you can do it locally if possible. Lets say someone is
mucking
> around with new format that's not complete. With a guid mapping they can
> happily let it work on their local system, without changing the demuxer.
> With remote identification the demuxer won't recognise it, and the xiph
> server won't have heard of it.
And defining a standard API beyond the DirectShow one they can implement
and register their .dll (or COM object or whatever) with lets them do
this as well.
--- >8 ----
List archives: http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'theora-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body. No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.
More information about the Theora-dev
mailing list