[ogg-dev] Fwd: New Ogg Dirac mapping draft

Fri Aug 15 16:48:44 PDT 2008

We've been discussing this on irc. Short summary, followed by some responses.

I think we've verified now that my old proposal works fine for MPEG-2
style reordered streams. I believe it can be made to work with 'open
gop' streams by making the granulepos assignment more sophisticated
than I described. However, Dirac allows essentially random reference
structures, so it's possible to construct streams with overlapping
keyframe dependencies my proposal can't handle without breaking the
numerically non-decreasing granulepos rule.

That's an argument for David's granulepos mapping, especially since
the open gop stuff in my mapping is hacky. My thinking now is that the
non-decreasing numeric encoding (the stop-the-presses version) is
better. GPH+GPL=frame works for theora, but doesn't do any better with
naive seeking than 'find this numerical granulepos' and doesn't
simplify frame-accurate seeking if you relax the one-page-per-packet
rule, which I think we must.

On Wed, Aug 13, 2008 at 1:08 PM, David Flynn <davidf+nntp at woaf.net> wrote:

> Defacto rules of ogg (I've not found these actually written down anywhere):

No, we've not really worked out these parts of the spec. Thanks for helping!

>  - Seeking is difficult:
>     -  Want to seek to frame N
>     -  GPH+L is non-unique (don't know if the right one has been found)
>     => Some values of GPH+L do not exist (searches may fail)
>     -  and GPH+L != N (ie, may find the wrong frame)

You're really wanting the granulepos field to be a frame timestamp.
Ogg just isn't designed to provide this information. The granulepos
isn't present in the stream for every packet. They're just supposed to
provide "seeking signposts" during the bisection search and, mostly as
a side effect, let an encoder give some hints to the muxer about
interleave order to reduce buffering.

Your proposal stuffs sequence headers and other aux data units in with
the following frame in a single Ogg packet, and then insists on
special one-packet-per-page encapsulation, so you can get this frame
timestamp behaviour. I think that's why it feels like such a hack to
me. New constraints, breaking abstraction layers, to do something that
the format doesn't intend.

I agree seeking is hard. To recap, Monty's original vision was that
granulepos would be monotonically increasing, and you could map your
seek time onto a granulepos and bisection search for that number. That
worked great for vorbis-only streams, but as soon as you have
multiplexed data, you have multiple granulepos schemes (or just
timebases) so it's easier to map any granulepos you find to time and
then compare in that space. With theora, we took advantage of this to
squeeze in a reference to the closest restart point (keyframe) without
revising the container code. So you can't calculate f:time->granulepos
at all in general now, only its inverse. And it turns out, because of
packed and continued packets, that you can't even find a single
restart point, you have to find "the last page with a timestamp that
maps to a time prior to the seek point, for each substream you care
about, and start decoding each substream there." And then you have to
search again for keyframe streams, back up by the preroll in lapped
streams, etc.

This is all about being able to do frame accurate seeking. Maybe
applications don't actually care about that, just getting in the
neighborhood is good enough. There are things a muxer can do (and
mapping spec recommend) as "best practices" to improve the performance
of such naive implementations. Like strategic page flushes. I think
we're all for that, but assuming those practices will always happen in
an application (like an editor) that needs frame-accurate access
without fail violates 'liberal in what you accept'.

 -r