[theora-dev] Re: Decoder accessing uninitialized variable...Re: [theora-dev]Using Theora in games?

Alen Ladavac alenl-ml at croteam.com
Sun Feb 29 06:13:36 PST 2004



> I think that's an issue with the way SDL is being used. There's probably
> a way to clear the window before displaying it. But the actual codec
> doesn't return garbage in the first frame.

Of course. I just mentioned that as an example.

> It is in some places. Like ClearDownQFragData. You can see below what
> kind of impact that has.

I see.

> Yes and no. If post processing is disabled, then pointers into these
> reference frames can be returned directly. Otherwise, a separate buffer
> needs to be used to store the post-processed video, because Theora does
> not have filtering in the loop. I'd forgotten about that. Add another
> 3*w*h/2 bytes (Theora still returns pointers into its own internal
> buffers in either case).

Eh, guess it's just not possible to make a good video decoder that's easy on
memory. :)

[comments on output to video memory in another mail]

> >  6.41% ClearDownQFragData
>
> A little use of memset in here would probably speed things up a bit.

Actually, the compiler has already put rept stosd instruction instead of the
for loop, so it's probably memory throughput. I don't know what memory
access patterns the decoder uses, but there might be some cache trashing in
a few places.

> The interesting things missing from this list are the dequantization and
> iDCT functions (which were assembly optimized in the VP3 source, but are
> pure C in Theora for portability).

Well, I just listed top 10, for simplicity. Actually, the whole profile is
very flat, there are no real spikes, what would signify either:

a) code spread out into small functions called often, with no inlining,
making it hard to analyze,
b) lot of cache trashing, or
c) just too much work to do - only way to speed it up would be to put entire
code into asm.

Now, I think c) is not the case, as there is a lot of other video codecs
that don't go even near this high in CPU usage. For analyzing a) and b) I'd
need a lot more insight into the code semantics, so I will delay this for
now and wait to see if this new decoder will stir the profile differently.

> Those functions already have MMX versions in the VP3 source, as do the
> PP functions. This is available as the vp32 module in Xiph's CVS.
>
> They were taken out of the theora module because the main goal of a
> reference decoder is clarity, not platform-specific optimizations. Clean
> and efficient design is a good thing for a reference decoder;
> Improvements in this area will be accepted gladly. Hand-coded assembly
> is not.

I understand you point. No problem. I will concentrate on functionality
(checking bandwidth variations, etc.) for now, and wait to see your new
decoder in action. There clearly are some problems in approach as well, not
only in peephole optimizations and maybe the change in thinking will give
improvements.

Thanks for your help,
Alen

<p><p>--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'theora-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.



More information about the Theora-dev mailing list