[theora-dev] [OT] Just saying hi!

Christoph Lampert chl at math.uni-bonn.de
Thu Feb 27 02:41:54 PST 2003

On Wed, 26 Feb 2003, Mike Melanson wrote:

> On Wed, 26 Feb 2003, Dan Miller wrote:
> > on the coeff layout, both patent and datarate considerations exist.  
> The original VP3 was intended as a low-resolution, low-datarate codec,
> so the performance issues were not paramount.  On modern PC's, even a
> 320x240 image can mostly be kept in cache, so it's not too bad a hit.  

One 320x240 in L2 cache, yes, but I don't know of any codec that just
needs _one_ image buffer. And, of course, 320x240 is 1200 8x8 blocks, 
so it's (assiming 16bit transformations) 150K AC/DC coefficients, which
may fit L2, but surely not into L1... 
On the other hand, keeping DC coefficients localized might help 
prediction, so okay, maybe it depends... 

> It helps datarate given that the entropy coder is not so
> sophisticated.  And it definitely avoids some of the most difficult
> patents.  Downside is playback performance on higher-res material is
> probably not as great as we would like.  However machines are pretty
> fast these days, and other codecs have other deficiencies that slow
> them down, so it's probably a wash.

";-) The machines are fast enough anyway..." this may hold for users at
home, but for anything slightly professional (generation of streaming
material, broadcast clients) somehow I don't think many people will
agree. Also, other plattforms might be more interesting than Joe User who
downloads movies from the internet, e.g. portable devices etc. 
And of course _encoding_ if still far from realtime, every CPU
cycle could be important for acceptance of the format. 
But of course, speed may not be the #1 issue of Theora at the moment,
I was just wondering who came up with the method you described. 

> 	Do you think this issue could be mitigated by using prefetch
> instructions found on certain CPUs? I just realized that this might be a
> case for arranging the fragments in one contiguous array like in the
> original source (in my new implementation, I was packaging a bunch of
> per-fragment information together into a structure, including the DCT
> coefficients).

Of course, prefetch would be essential. XVID (MPEG-4) works almost
entirely in 16x16 blocks, all data except for motion estimation fits into
L1-cache, so prefetch is not needed that often. But if the data to be
processed is distributed over an area larger than L2 cache, and especially
with large gaps, prefetch seems absolutely necessary. The problem is just
that good prefetch depends very much on the underlying CPU, memory and
it's speed... 


--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'theora-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.

More information about the Theora-dev mailing list