[Theora-dev] Re: PC-based video server

Wed Jan 26 21:06:47 PST 2005

> Keep in mind of course that coefficients are stored in zig-zag order,
> not embedded-tree order. So to get the upper 2x2 window of coefficients,
> you actually need to decode the Huffman codes for the first 5, and for
> 4x4 you need the first 25.
Yes, exactly I was just recently manually going that zig-zag with a
pencil, notepad and some printouts :-)
>You might also run into drift problems with
> INTER_NOMV blocks if you don't decode to full resolution and run the
> loop filter properly. You'll also need to decode the coefficients for
> all of the blocks, regardless of what region you want to decode (except
> perhas in the last coefficient).

I won't have these problems - no loop filters are implemented yet. You see
from the experience with MJPEG modek 313 camera we never use less than 50%
(in JPEG standard definition) quality, usually some 70-80%. And as I've
seen in the Theora documentation the upper quarter of the qi have the
limits exactly zero. On the other hand memory accesses are very expensive
in FPGA implementation - I already use more tan 90% of the theoretical
bandwidth of the DDR SDRAM (at the given clock rate) and the 12-bit
"pretokens" are stored in effectively 32-bin SDRAM in compact form - 8 in
3 words. Actually compressor_one receives data from Bayer-encoded pixel
data in 20x20 tiles that are converted to 6 8x8 blocks. Inherited from the
313 camera Bayer->YCbCr converter uses only 18x18 pixels (3x3
interpolation for Y) - I'm planning to improve the quality so I reserved
20x20 SDRAM accesses and 5x5 interpolation. Probably it will be possible
to implement later 2 modes - high quality (high qi) - 5x5 Bayer->YCbCr -
zero loop filtering, lower quality - 3x3 Bayer, and have 1 pixel around
8x8 blocks for loop filtering

> So, I don't think reduced resolution buys you a whole lot in terms of
> CPU savings (unless you're willing to live with drift artifacts) in
> decoding (obviously it does in encoding the resulting MJPEG stream), but
> a smaller region of interest will---albeit not directly proportional to
> the ROI size, since you still have to decode all of the coefficients for
> the frame.

So what do you think with the above said - no loop filtering. Will it
really be faster?

> If you _are_ willing to live with drift artifacts and some additional
> small quality loss, what might be interesting is to try direct
> Theora->MJPEG transcoding. That'd be a lot more developer time, however.
> The other things I talked about are actually pretty easy to hack into
> the experimental decoder.

What I need for this project is not direct Theora -> MJPEG, but
Theora->MJPEG with "digital PTZ" (zoom - probably just 1/2/4/8). And that
PTZ over full-resolution Theora data will be needed even without
transcoding - we already ran into similar problem working with multiple
hi-res MJPEG streams  where there are small windows for multiple cameras
on the same screen. Not really many as the bandwidth of the network is
limited - one of the camera streamers can with high quality settings, send
70MBps - there is no room even for two cameras on a 100mbps LAN if running
in this mode. So we need to use abbreviated IDCT in mplayer to make it
useful. So next idea was to combine decimation and windowing in the same
optimized JPEG decoder - some things are even easier with Theora than
MJPEG - as having default thumbnail 256x198 preview made of just DC
components that are some easy to extract. Successful solution to this
software task combined with (now, OK - soon, available) hardware will make
this Theora-based solution very competitive in this application area.