[vorbis] tarkin (was Re: FREE at last)

Mon Oct 9 14:46:57 PDT 2000

On Mon, 9 Oct 2000, Kenneth Arnold wrote:

> Hmmm... good idea. Any data suggesting how good a compression this achieves
> and how it compares to MPEG-whatever?

Chad said the test code was in the same ballpark as mpeg-1. Wavelets
generally do better than block-DCT.

> I saw a link earlier on vorbis-dev for a chirplet-based system -- I'll have
> to dig that up and look at it some more. Can't remember who it was, but
> it was probably Ralph because he has done a lot of (NS)MLP. :)

What's MLP?

> Finally, did it include motion comp. or not? Could this improve things?

The algorithm Jack described is 3D wavelet, which is equivalent to motion
comp. I'd expect that to be even more effective with chirplets. OTOH,
someone pointed out that a problem with 3D transforms is that framerates
are *too low* to get good continuous 'tones', so it's possible motion-
compensated 2D wavelets would work better below, say 15 fps. Still, I'd
try subpictures or the like as well; they might be just as good while
staying in the same framework.

 [...]
> Maybe rtlinux could do sync issues better. But anyway, there needs to be some
> sort of marker, either in the metadata or built into the stream itself, to
> synchronize all the elements of the stream in time. So far we have been able
> to ignore all that, because it's just audio, but there can be video, lyrics,
> closed-captioning, links, markers, etc. that all will need to be synced to
> something, probably the audio. From what you describe, it looks like the
> facilities to do that are pretty minimal.

rtlinux would help more with bounded-latency for realtime encoding, I
think. Given that you want at least 1/10 sample accuracy, the audio
clocking is best left in hardware. Contrariwise, because of the vast
sample-rate difference, video is much less of a problem. (again, unless
you're writing a camera controller :)

Re minimal facilities, all the compressed stream can really do is provide
timestamps (or ranges). It's easy enough to keep track of exactly where
you are in the decoder (libvorbis provides sample-accurate seeking, for
example). In the trivial implementation, it's just a matter of lining up
the numbers.

A more sophisticated approach is to write some sort of scheduler that
works to (1) keep the audio buffer full (2) display video and text frames
at the appropriate time (3) issue packets to the various decoders so the
above can happen as smoothly as possible given available system resources.

Some suggested heuristics:

maintaining audio is the most important thing. drop video frames as
necessary down to 1-per-5-seconds or so. then drop audio.

Text (closed captions/subtitles) is better run 'fast-forward' or displayed
in aggregate than dropped.

the black magic is in the thresholds which decide when not to
decode/display a frame. 

BTW, there's been lots of discussion on the livid-dev list
(linuxvideo.org) but they haven't worked out a solution yet.

 -r

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.