[theora] Ogg index and Skeleton 4.0

Thu Apr 29 03:12:08 PDT 2010

On 29/04/2010 8:59 p.m., xiphmont at xiph.org wrote:
>> Each content track has a separate index, which is stored in its own
>> packet in the Skeleton 4.0 track.
>>      
> I recall there had been some discussion of the exact way to go about
> this, I didn't catch all the rationale.  I didn't see if anyone looked
> at Matroska and gave reasoning for not doing it in a similar way.
>
> (I can think of a few, but hopefully someone has a pointer to discussion)
>
>    

I think that discussion mostly happened on IRC around November last year.

Initially I had a single index which merged all stream's keypoints. But 
in cases where you have sparse streams, like Kate streams, this can 
result in seeking back much more than is desirable; a web client may 
prefer to open a second connection to request that data instead. I 
switched to multiple separate indexes so that a client could potentially 
decide its own behaviour in such a situation.

> I assume that the streams with and without 'keyframes' are clearly
> distinguished, correct?
>    

A client which mapped serialnos to codecs could make this distinction. 
Why would you need to?

>> For every content stream in an Ogg segment, the Ogg index bitstream
>> provides seek algorithms with an ordered table of "key points". A key
>> point is intrinsically associated with exactly one stream, and stores the
>> offset of the page on which it starts, o, as well as the presentation time
>> of the keyframe t, as a fraction of seconds.
>>      
> It might be worth mentioning or explicitly addressing PTS and DTS
>    

Sorry, I'm not sure what you mean by this.

> Is the rational timebase required to have any relation to the timebase
> of the granulepos itself?
>    

Not explicitly. I imagine an indexer would do so in order to preserve 
accuracy though.

>> The Skeleton 4.0 track contains one index for each content stream in the
>> file. To seek in an Ogg file which contains keyframe indexes, first
>> construct the set which contains every active streams' last keypoint which
>> has time less than or equal to the seek target time. Then from that set
>> of key points, select the key point with the smallest byte offset.
>>      
> ...wouldn't you select the earliest from the set of streams currently
> being used?  I also assume that this applies only to continuous
> streams, and discontinuous streams simply follow a smart placement
> strategy?  Or would one possibly go back to inspect the last packet of
> a discontinuous stream as well?
>
>    

For exact seeking you need to decode forwards from the lowest offset, 
not the earliest timestamp. If you don't decode from the lowest offset, 
you can't guarantee that you've started decoding before the start of all 
the data required to render the seek target on all streams.

>> 4. First-sample-time numerator: 8 byte signed integer representing
>>     the numerator for the presentation time of the first sample in the track.
>> 5. First-sample-time denominator: 8 byte signed integer, with value 0
>>     if the timestamp is unknown. Decoders should always ensure that the
>>     denominator is not 0 before using it as a divisor!
>>      
> ...this is not based on the stream's timebase?
>    

Hmm, no reason why it shouldn't be. Good catch. :)

>
>> The byte offsets stored in keypoints are relative to the start of the Ogg
>> bitstream segment.
>>      
> Clarify-- the beginning of the bitstream (eg, 0 for the first segment)
>    

Beginning of bitstream, so 0 for the first segment, and from the start 
of each fisbone packet in each subsequent segment/link in a chained ogg.

The fishead packet also contains the length of the segment, so you can 
hop to the start of the next link in the chain if you want to easily.

Chris P.