[theora] Ogg index and Skeleton 4.0

Thu Apr 29 01:59:55 PDT 2010

Hey folks,

Sorry to come late to the party.  My interest here is mostly keeping
things in view as I play with transOgg (which will also have an index,
one would hope by using a preexisting index spec).

> Each content track has a separate index, which is stored in its own
> packet in the Skeleton 4.0 track.

I recall there had been some discussion of the exact way to go about
this, I didn't catch all the rationale.  I didn't see if anyone looked
at Matroska and gave reasoning for not doing it in a similar way.

(I can think of a few, but hopefully someone has a pointer to discussion)

> The index for streams without the
> concept of a keyframe, such as Vorbis streams, can instead record the
> time position at periodic intervals, which achieves the same result.
> When this document refers to keyframes, it also implicitly refers to these
> independent periodic samples from keyframe-less streams.

I assume that the streams with and without 'keyframes' are clearly
distinguished, correct?

> For every content stream in an Ogg segment, the Ogg index bitstream
> provides seek algorithms with an ordered table of "key points". A key
> point is intrinsically associated with exactly one stream, and stores the
> offset of the page on which it starts, o, as well as the presentation time
> of the keyframe t, as a fraction of seconds.

It might be worth mentioning or explicitly addressing PTS and DTS
here; having worked through multiple examples for transOgg I came to
the conclusion that the only generalizable system was one in which
keyframes/syncpoint frames have identical actual PTS and DTS (and this
implies the syncpoints also appear in strict chronological order).

It's not relevant to Vorbis or Theora, but it is highly relevant to Dirac.

I haven't yet worked through the implicit assumptions of preroll
(rolling intra) in an out-of-order codec, but I assume the naive way
is the only way :-)

> This specifies that in order
> to render the stream at presentation time t, the last page which lies before
> all information required to render the keyframe at presentation time t begins
> exactly at byte offset o, as offset from the beginning of the Ogg segment.
> The offset is exactly the first byte of the page, so if you seek to a
> keypoint's offset and don't find the beginning of a page there, you can
> assume that the Ogg segment has been modified since the index was constructed,
> and that the index is now invalid and should not be used. The time t is the
> keyframe's presentation time corresponding to the granulepos, and is
> represented as a fraction in seconds. Note that if a stream requires any
> preroll, this will be accounted for in the time stored in the keypoint.

Is the rational timebase required to have any relation to the timebase
of the granulepos itself?

> The Skeleton 4.0 track contains one index for each content stream in the
> file. To seek in an Ogg file which contains keyframe indexes, first
> construct the set which contains every active streams' last keypoint which
> has time less than or equal to the seek target time. Then from that set
> of key points, select the key point with the smallest byte offset.

...wouldn't you select the earliest from the set of streams currently
being used?  I also assume that this applies only to continuous
streams, and discontinuous streams simply follow a smart placement
strategy?  Or would one possibly go back to inspect the last packet of
a discontinuous stream as well?

> 4. First-sample-time numerator: 8 byte signed integer representing
>    the numerator for the presentation time of the first sample in the track.
> 5. First-sample-time denominator: 8 byte signed integer, with value 0
>    if the timestamp is unknown. Decoders should always ensure that the
>    denominator is not 0 before using it as a divisor!

...this is not based on the stream's timebase?

> 6. Last-sample-time numerator: 8 byte signed integer representing the end
>    time of the last sample in the track.
> 7. Last-sample-time denominator: 8 byte signed integer, with value 0
>    if the timestamp is unknown. Decoders should always ensure that the
>    denominator is not 0 before using it as a divisor!

Same question

> 8. The keypoint presentation time denominator, as an 8 byte signed integer.
> 9. 'n' key points, each of which contain, in the following order:
>     - the keyframe's page's byte offset delta, as a variable byte encoded
>       integer. This is the number of bytes that this keypoint is after the
>       preceeding keypoint's offset, or from the start of the segment if this
>       is the first keypoint. The keypoint's page start is therefore the sum
>       of the byte-offset-deltas of all the keypoints which come before it.
>     - the presentation time numerator delta, of the first key frame which
>       starts on the page at the keypoint's offset, as a variable byte encoded
>       integer. This is the difference from the previous keypoint's timestamp
>       numerator. The keypoint's timestamp numerator is therefore the sum of
>       all the timestamp numerator deltas up to and including the keypoint's.
>       Divide the timestamp numerator sum by the timestamp denominator stored
>       earlier in the index packet to determine the presentation time of the
>       keyframe in seconds.
>
> The key points are stored in increasing order by offset (and thus by
> presentation time as well).
>
> The byte offsets stored in keypoints are relative to the start of the Ogg
> bitstream segment.

Clarify-- the beginning of the bitstream (eg, 0 for the first segment)
or the beginning of the non-Skeleton data, eg, the first code
identification header?

Monty