[vorbis-dev] Ogg VEO Codec, video in an Ogg stream

Sun Aug 27 12:09:59 PDT 2000

I updated the packet format and encoding algorithm:
http://uts.cc.utexas.edu/~foxx/oggveo2.tgz (~160k)

On Sun, 27 Aug 2000, Ralph Giles wrote:

> Neat. Can you explain your algorithm a bit? What kind of compression are
> you getting? Looks like your example is around 2.5 bits per pixel. Does it
> work as well when the movement isn't so well partitioned?

Thanks.  I'm now getting about 1.1 bits per pixel for that same video
clip (23 frames x 320 x 240) with approximately the same quality
results.  The way the general algorithm works is by recursively examining
rectangles in the frame.  Each rectangle is more or less similar
(geometrically speaking) to the frame.

When a rectangle is examined, if the frame is a delta frame the first
criterion is: Is this rectangle similar enough to the corresponding part
of the previous frame?  If it is, then record that rectangle as being
transparent.  If not (or if this frame is static) apply the next
criterion: Is this rectangle uniformly colored enough?  If it is, record
the rectangle's color, otherwise subdivide it.

The first rectangle examined is the entire frame.  If it is to be
sudivided, it is cut into quarters like this:
 _________         _________
|         |  __\  |____|____|
|_________|  --/  |____|____|

Then the analysis algorithm is applied recursively to each subregion in 
the order of: upper-left, upper-right, lower-left, lower-right.  Now that
I think of it, it'd probably be better to reverse the order of the last
two since upper-right is closer to lower-right.  That's because they are
closer together and therefore they will likely look similar.  The data
order may promote better compression if I use an entropy encoding on one
or more of the bit streams.

Eventually the entire frame will be partitioned into rectangles,
each will either be filled or transparent.

A transparent rectangle requires storage of these bits:
  Transparency flag:  1 bit
  Recursion depth:    4 bits (enough for up to a 2^16 x 2^16 frame)
---------------------------------
  Total:              5 bits

Bits needed to store a filled rectangle:
  Transparency flag:  1 bit (if it's in a delta frame)
  Recursion depth:    4 bits
  Color:             16 bits (5-6-5 RGB encoding)
---------------------------------
  Total:             20 or 21 bits

The recursion depth only needs four bits for a 2^16 x 2^16 frame because
the frame can be quartered at most 15 times before each rectangle is
only a pixel.  Thus the subdivision depth can be stored as a number
(0-15) which fits in 4 bits.

Multiple frames are packed per each Ogg packet.  The first frame will be
static and the rest will be delta.

Ogg Veo stream (nominally 4 frames per packet):
  [VeoHeader]
  [VeoComment]
  [VeoData: Static Delta Delta Delta]
  [VeoData: Static Delta Delta Delta]
  ...
  [VeoData (end): Static Delta]

Right now, the resulting video file shrinks to about 60% of it size
after being gzipped.  This means there is a lot of room to be gained
by adding some sort of lossless compression.

I have not yet tried the codec on other than the sample video included
in the tar file.  I have tested bitrate scalability.  The older version
didn't scale well but the newer one (oggveo2.tgz) does better.  It seems
that, like Vorbis, the magic is in the encoding process.  What I'd
like to do is have each delta frame add detail successively (as opposed to
being used for pure updating purposes).  I have a few ideas on how to do
that.

> I'm working on adding mng as an image/high-quality-video option, BTW.

Cool, how is the mng layer planned to be utilized?  Is it targetted for
providing an animated album cover or can it be synced to sound as
video?  Or both?

- David

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/