[vorbis-dev] end-user mode for a moment (side-by-side tests)

Wed Sep 13 20:04:33 PDT 2000

Smack my curiosity, but I encoded some songs in Vorbis mode 2 and tried
to be able to distinguish the difference between that and the uncompressed
WAVs. (*smacks self*) but here's what I noted:

It's actually kind of hard to tell the difference :) (and I consider myself
to have a decent set of ears, though not anywhere near the best)

I got my accuracy to about 90%, but I couldn't figure out what it was. Finally
I figured out that it was some of the high-range in a few spots that hit with
less -- brillience is the word that comes to mind -- than the uncompressed.
Yeah this is 128k so what should I really expect, and compressed against
uncompressed so almost no fault at all found here, but could the psycoacoustic
model be tuned any? Maybe if somebody could assemble a "test kit" that a
lot of people could use and try to tune the model to what they thought
sounded best, then the results could be averaged? Or do we have it on higher
authority that the psycoacoustics are the best they could be? (I am reminded
of Linus Torvalds' announcement for 2.4.0-test2 on l-k back when I was
subscribed.)

Wow. Not bad.

Now gotta try against MP3. Dang, does that mean I have to grab notlame or
bladeenc? Darn... I didn't even install them when I reinstalled last,
because it seems that I have my audio compression needs taken care of.

Back to developer mode:

Thanks, Ralph, for the Ogg todo sent a while ago. I've only now got to really
studying it and looking at what to do. Looks like video is it. So a couple
questions for the list:

1. Where's the Tarkin source anyway?

2. I am aware that Tarkin uses wavelets. MPEG uses object detection and
motion estimation. What other methods are out there? Does anybody know
of any new, cool methods for compressing video? Or, failing that, does
anybody know [of] anyone who does?

3. I have looked over the MPEG document Marshall said to look over a while
ago (about varying levels of detail). I think that's a good idea (in fact
that was a goal even before I read that). See what you all think about my
personal codec wishlist (from a starting-from-scratch viewpoint, even though
it probably won't work out that easily):

* Three levels: packet, frame, and field. Packet holds all the stuff that
  should naturally go together and is otherwise worthless when split up.
  (I'm thinking streaming here). Field is collection of packets that
  describes part of a frame. It may pull information from a lot of sources,
  e.g., raw image data, data from frames earlier / later (with an arbitrarily
  adjustable window), "scratch" area, whatever. It should have the capability
  to embody vector graphics, arbitrary transforms, effects, etc. even if the
  encoder can't pick them out from a source video (if it could, that'd be
  great, but that gets very compex). Maybe field == packet; I need to think
  some more about that. But by "part of a frame", I mean a level of detail
  as opposed to a region (although region might be useful also). Object
  descriptions are hierarchical in importance by nature; the codec should
  take advantage of this. Coding should be done residually, i.e., take as
  much information about the frame as can be embodied relatively simply, and
  repeat with what's left over. The amount of complexity per independent
  block should be adjustable over a wide range. Each block iteration
  (hierarchical level) could be assigned a priority, and when streaming, the
  transport could choose to only send the blocks above priority x. Different
  methods could be used to formulate these blocks, possibly even different
  methods for different blocks describing the same area. This would allow
  motion estimation to be used for entire objects, and e.g. wavelets for
  details about the object. The definitions and implementations of the
  residue and coding areas are left for later, to allow for more than
  enough flexibility (I hope).
* Every frame should be able to reference back to frames before it, i.e.,
  no MPEG's I frames (except maybe at the beginning of the stream). Okay,
  so maybe there should be I-frames, but use them more carefully. Possibly
  a lossless compression could be made from them... but back to the main
  issue here: a typical viewer will be watching the video for at least 100
  megabits before [s]he even starts to worry about quality as opposed to
  content. So I-frames can be very sparse. The tradeoff is more redundancy
  in the diff frames. Each diff frame should transmit the diff, plus some
  data that the viewer should know if it's been watching since the last
  I-frame. This would allow streaming to be able to take advantage of scene
  similarity without worrying too much about the consequences of lost data.
  Possibly the redundant data could have a temporal component attached also,
  so when the video is saved to disk after streaming, it could be moved to
  the proper place where it should have been first introduced and then
  removed as much as possible to keep redundancy to a minimum on a fixed
  medium (key point: the stream is not the compressed video. They work together
  but both can be modified to hold the same or similar data in a more optimal
  manner). Another key point: there's a lot you can tune here (amount of
  redundant data transmitted, frequency of I-frames, etc.). More flexibilty.
* VBR of course. But since streaming often works best when bitrate is constant
  (TCP windows, if streaming over TCP), allow the redundant data to be filled
  in whenever the data size is otherwise small.
* Scratch pad to save previous data. e.g. if scene is switching between two
  talking heads, should save data associated with one when switching to other.
  Key point is that maybe viewer didn't catch that old data; maybe send it
  before stream starts playing, or put it in the redundant frames. First
  sounds nice if you're not multicasting; second is more suited for
  broadcasting.
* Assume viewer knows everything about the stream you sent, then either the
  viewer could ask (unicast better again) or the streamer could just resend
  anyway (multicast) the missing data.

Spewing a lot to myself above, and I really didn't mean to spew that much,
but chew on it and tell me what you think. That's the product of probably
about 15 minutes of mostly continuous thought that is very likely disjointed
and missing some key information still locked somewhere in my head, so don't
take it as written in anything but sand sprinkled in tide pools. It's also
11:00 PM local time, so I may have gone insane and not known about it.

The bit of judgement in me that hasn't gone to sleep yet is telling me that
this is a good place to stop.

Kenneth

PS - I'm going to really like reading that when I'm more awake. It'll be fun.

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.