[vorbis] Video codec

Thomas Marshall Eubanks tme at 21rst-century.com
Mon Sep 11 05:38:11 PDT 2000



Dear Jelle;

   Very nice summary. I have two comments (in -line).

                                   Marshall Eubanks

Jelle Foks wrote:

> Just for clarity, so that we have the correct terminology and numbers,
> and I'll raise some issues that I think should be considered when
> designing Ogg Video.
>
> Digital Broadcast Quality Video is described in CCIR601/656, which is
> basically the following:
>
> Active Frame Size    | Frame Rate   | Subsampling | Active pixels per
> second
> ---------------------+--------------+-------------+-------------------
> NTSC: 720x480        | 1000/1001*30 |  4:2:2      | ~10.3M
> PAL:  720x576        | 25           |  4:2:2      | ~10.3M
>
> A 'frame' is a full image of video. In interlaced video, a frame
> consists of two fields, the even field and the odd field.
>
> The video signals are encoded in the YCbCr color space (Luminance +
> Crominance-Blue + Crominance-Red). Each of the color components Y, Cb,
> or Cr is called a 'subpixel'. A subpixel in CCIR601/656 has a precision
> of 8 bits.
> The subsampling of CCIR601/656 is called '4:2:2' subsampling in 'mpeg
> terms', and means that the crominance pixels are decimated by factor of
> two in the horizontal direction. The result is that color has only half
> resolution in the horizontal direction (360x480NTSC/360x576PAL). To be
> honest, this subsampling is the first step of lossy compression of a
> factor  ((1+1+1)*8)/((1+0.5+0.5)*8)=1.5, because a 24bpp image is
> described with an average of 16 bits per pixel after reduction of the
> chrominance resolution.
>
> The number of 216Mbit mentioned here is CCIR601/656 video data including
> the blanking and retrace interval overhead (a CCIR601/656 video stream
> also contains non-active pixels, because it also contains the timing so
> that the video data can easily be transformed to and from the analog
> domain).
>
> My opinion is that, when discussing video compression, it is confusing
> to speak of 'compression ratios', because it is never clear whether
> compression ratio before or after subsampling is meant, and whether or
> not non-active pixels were counted in the non-compressed stream.
>
> A factor of 100 compression of the 216Mbit stream would result in a
> 2.16Mbit stream. However, a factor of 100 compression of the active
> CCIR601/656 video pixels would result in a 10.3*16/100=1.65Mbit stream.
> There is a 24% difference between the two numbers.
>
> I suggest using the term 'bits per pixel' to quantify the compression
> ratio. With that number there are no unclarities and it's easy to
> calculate the resulting video bit-rate given the video image resolution.
> 'D1 at 1.5Mbps' is approx 0.15 bits per pixel, 'D1 at 3Mbps' is approx
> 0.3 bits per pixel.
>
> Rough numbers: With JPEG compression, you get between 1-5 bits per
> pixel, jpeg is mostly used in the range of 1-2 bits per pixel. JPEG200
> claims to get 4-8 better compression than JPEG, if that is true it's
> about the range of 0.15-1.25 bits per pixel. With MPEG compression, you
> can get between 0.15-1.5 bits per pixel, depending on the encoder and
> image quality of course (and the MPEG version, MPEG1, MPEG2, or MPEG4).
> When counting uncompressed video as 24 bits per pixel, this explains the
> claimed 100x compression of MPEG video at 0.24 bits per pixel. Below
> 0.15 bits per pixel is often very agressive coding for applications such
> as video conferencing, in which case large parts of the image are left
> completely unchanged (H.263/H.26L).
>

I think that the end of this should read

"in which case large parts of the image are left
completely unchanged FROM FRAME TO FRAME"

The thing missing from this discussion is that aggressive compression of video
always
depends in some fashion on only encoding the difference between frames, not
doing each frame from scratch. The efficiency of this depends on what signal
is being encoded - good for talking heads on NPR, not so good for The Talking
Heads in concert, much less NBA basketball or world cup soccer. The difference
can easily be a factor of 3 to 6, even with MPEG type codecs that allow for
blocks or objects to move from frame to frame. This means that some sort of "bit
bucket"
would be very useful in a full motion video codec, where more time is spent
sending active scenes than passive ones. To do this means that you will
run behind real time.

Question : Is the Vorbis Video Codec to be used for video conferences or
NBA basketball ?

(I would argue for NBA).

If for video conferencing, the time delay MUST be kept below about 200
milliseconds, but the need for motion detection is reduced.

If for the NBA, you should decide how far behind real time you are willing to
run,
(I would argue for at least 1 second)
and provide at least the hooks for  a bit bucket.

If you say "both", then IMHO you need to think about what you are trying
to accomplish.

>
> I think if we want to compete based on compression ratio, then we should
> somehow get at 0.1 bits per pixel or below. A CDROM is approx
> 650x8=5.2Gbits, so for an hour of video you have 5200/3600=1.44Mbits/s,
> which would dictate a compression to below approx (1.44/10.3)=0.14 bits
> per pixel if there is to be any room left for audio etc.
>
> Of course it's easy to get 0.14 bits per pixel if there is no quality
> requirement... When comparing compression methods, image quality is
> often measured in PSNR (dB) or MSE (mean squared error). A compression
> method can be considered better if it achieves better PSNR/MSE at
> similar bit rates, or lower bit rates at similar PSNR/MSE. So, when
> introducing a video compression method with amazing bit-rates, it can be
> proven to have better quality than the alternatives by comparing the
> PSNR/MSE at various bit-rates. Of course, the effecitveness of PSNR or
> MSE as image quality measure is a point of discussion, so there is
> always still room for interpretation of the numbers (note that there are
> other measurement methods that attempt to give better numbers, there's
> even an expert group (www.crc.ca/vqeg)).

When you start doing compression methods based on our vision system,
the eye then becomes the best tool to measure performance.  MSE  or RMS type
error
metrics, although a routine metric for the performance of
typical thermal noise based transmission channels, can give ridiculous results in

this case.

Here is a simple example :

Suppose you have  a black and white TV system with
3.5 million pixels and 256 gray levels per pixels, and an average gray level
value of
128. Now, suppose you compress by one method, which causes
every pixel value to be off by one, randomly.
Your eye would hardly notice this, and the MSE is 1.
If a different compression sets a block of 10 by 10 pixels to all black or all
white,
right in the center of the screen,
but perfectly renders all the other pixels, then the MSE is

(10^2 x 128^2)/ 3.5 x 10^6 = 0.468 (RMS pixel error is 0.684)

The MSE prefers the second compression method, the eye would strongly
prefer the first.

My deeper point here is that you cannot escape listening / viewing trials when
you
are talking about compression methods tuned to our physical sensors (ears/eyes
and brains). Done the conventional way, these are expensive. If the open source
movement
could extend to open evaluation of codecs by large numbers of people, then it
could develop an incredible advantage here - evaluating new codec improvements in

days, not months.

>
>
> Ok, then there is the issue of variable or fixed bit-rate and variable
> or fixed quality and encoder and buffering latency. If you have a
> variable bit-rate encoder for a fixed quality stream, or a fixed
> bit-rate encoder for a variable quality stream, then you can keep the
> buffers small to reduce the latency. However, if you put a maximum on
> the bit-rate, and don't want to accept occasionally reduced image
> quality of the video, then you will need buffering to even out the
> bit-rate on the hard-to-encode pieces of video, which of course
> introduces latency. When buffering is needed, the decoder must know how
> much to fill the buffer before starting to display to ensure that later
> on, during display it never has to wait for compressed data to be
> received during the hard-to-compress video scenes. Additionally, there
> may be a limitation on the buffer size that is economical in the decoder
> (especially in hardware, RAM=money). The MPEG standards include a scheme
> to control this, centered around the 'video buffer verifier (VBV)'.  I
> think Ogg video should address this issue as well.
>

This requires some idea of how far behind real time you are running (see above),

>
> Cheers,
>
> Jelle.
>
> Chrissy and Raul wrote:
> >
> > Hi,
> >
> > I guess this is a good time to start putting together a wish list for a
> > video codec.
> >
> > I see that for audio the compression is around 10X for reasonable quality.
> > I am sure this will start its own thread of conversation.
> >
> > For video you can do 40X fairly easily and the big task is to go to 80X or
> > 100X with reasonable picture quality, say, a peak luma SNR of more than 30
> > dB.  Uncompressed Professional Quality video (called "D1" see below), like
> > the one at TV stations before broadcast, is 216 Mbps.  Smaller resolutions
> > have less bps of course.
> >
> > One of the many tasks is going to be to work around the existing patents but
> > if the audio guys can do it, the video guys should be able to as well.  Not
> > everything has been discovered or patented for video compression.
> >
> > I suggest to focus on "SIF", quarter-screen video (352x240 for NSTC
> > rectangular, TV, pixels and 320x240 for square, computer, pixels).  For
> > nomenclature purposes full screen is "VGA" is 640x480p progressive and "D1"
> > is 720x480i interlaced (some people also use "Half D1" 360x480i for some
> > products).  Computers are progressive, TV is interlaced.  30 Frames/second,
> > or 60 fields/second yields a natural moving image that does not suffer too
> > much from "jumpiness" during pans.  Movie film is 24 fps sometimes
> > presenting each frame three times for a net 72 fps.
> >
> > Experiments show good quality for SIF, 30 progressive frames/second at 512
> > Kbps system bitrate (Audio is 96 Kbps).  There are some experiments on D1
> > (full) resolution at 1.5 Mbps video-only but the quality is not good.  D1
> > resolution at 3 Mbps can look good today.
> >
> > Any input on desired resolutions, bitrates, color resolution (color
> > subsampling), frame rates, etc?
> >
> > I will be unavailable until Monday, Sep 11th so if you send e-mail or post
> > questions I will not be able to get back to you until Sep 11th.
> >
> > Excuse the apparent lack of order, I just want to start throwing
> > ideas/concepts to the list.  All can be clarified and classified in due
> > time.
> >
> > RAUL LOPEZ
> > _________________________________________________________________________
> > Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com.
> >
> > Share information about yourself, create your own public profile at
> > http://profiles.msn.com.
> >
> > --- >8 ----
> > List archives:  http://www.xiph.org/archives/
> > Ogg project homepage: http://www.xiph.org/ogg/
> > To unsubscribe from this list, send a message to 'vorbis-request at xiph.org'
> > containing only the word 'unsubscribe' in the body.  No subject is needed.
> > Unsubscribe messages sent to the list will be ignored/filtered.
>
> --- >8 ----
> List archives:  http://www.xiph.org/archives/
> Ogg project homepage: http://www.xiph.org/ogg/
> To unsubscribe from this list, send a message to 'vorbis-request at xiph.org'
> containing only the word 'unsubscribe' in the body.  No subject is needed.
> Unsubscribe messages sent to the list will be ignored/filtered.

                                   Regards
                                   Marshall Eubanks

   T.M. Eubanks
   Multicast Technologies, Inc.
   10301 Democracy Lane, Suite 410
   Fairfax, Virginia 22030
   Phone : 703-293-9624
   Fax     : 703-293-9609

   e-mail : tme at on-the-i.com

 http://www.on-the-i.com         http://www.buzzwaves.com

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.



More information about the Vorbis mailing list