[theora] Indexing Ogg files for faster seeking

Thu Sep 24 17:35:33 PDT 2009

On 25/09/09 00:32, Silvia Pfeiffer wrote:
>
> The skeleton information and the index are basically the same type of
> information: meta data about the other tracks. So, it makes logical
> sense to throw them together.
>    

No, the skeleton and index hold different types of information which are 
interesting to different classes of applications. The skeleton 
duplicates the information from the first few pages of the ogg file, and 
is only useful to external applications which want metadata about the 
file without having to know how to parse the non-skeleton packets. For 
example applications like the Unix `file`command, or the Windows file 
properties dialog. However decoding software cannot risk trusting the 
skeleton track, as the authoritative source of track metadata is the 
tracks themselves, so most players just ignore the skeleton track. 
However the index is useful to decoding applications, but it is not 
useful to `file`-type programs. Why would a file-properties dialog care 
about a keyframe index?

By including the index in the skeleton, you force both `file`-type and 
player-type apps to decode stuff they don't care about.

> Further, I don't think putting the index into skeleton will make
> skeleton parsing and index extraction more difficult. But that's just
> a hunch for now, so I want to see if it all makes sense.
>    
>> It makes it relatively easy for a server
>> to generate the index track on the fly, add it to an existing ogg
>> file, and send that file at a clients request for example.
>>      
> As  new tracks are being added, the skeleton track needs to be updated
> with additional fisbone packets per new track, too. So, if we throw in
> an index track per existing track, we duplicate our number of tracks
> and double the number of skeleton fisbone packets. Instead, if we
> throw the index in with each skeleton packet and make it such that it
> is compatible with current skeleton, we make backwards compatibility
> easier.
>    

The obvious approach is to have one index track per segment, with one 
index packet per indexed-track. That would only require one extra 
fisbone packet in that case?

There's also no benefit from having a fisbone packet for the index 
track, as it's not a "playable content" track, so most of the fields in 
the fisbone header (from memory) are not relevant. The only thing an 
index's fisbone packet would tell you is that there is an index present, 
but even that's not interesting to an external `file`-type app, as the 
presense of an index shouldn't change its seekability, just its speed at 
seeking under certain conditions.

>> That said, I'd be interested in seeing how you'd modify the skeleton
>> specification such that it can have indexes inside it. Can it be done
>> without breaking existing applications that parse skeleton streams?
>>      
> Being curious about this, too, I gave it a try.
>    

Thanks for that. I have a few comments below. I still don't like the 
idea of including the index in the skeleton track, but if it causes less 
bustage it may be worth considering.

> If I understand the index track specification correctly, this
> information concerns the complete video file:
> * The playback start time, in milliseconds, as an 8 byte unsigned
> integer, this is the presentation time of the first frame.
> * The playback end time, in milliseconds, as an 8 byte unsigned
> integer, this is the end time of the last frame.
>    

The length in bytes of the indexed segment should go in the header 
packet, there's no need to duplicate it for every stream. This is the 
length of the file/segment, not the length of the track. This field 
exists so that you can immediately jump to the next segment when seeking 
if the seek target is outside of the start/end time range which is 
indexed in this segment.

A side question: if you have a chain consisting of two concatenated ogg 
files, the first with timestamps 5-10 minutes, and the second with 
timestamps from 20-25 minutes, would you expect to see a timeline in the 
player UI of 0-10 minutes? Or should it start at 5, and end at 25 with 
some kind of marker on the progress bar to show the time-break? There 
was some talk in #theora of making the timestamps in the index offsets 
relative to the start of the segment, rather than the time which the 
granulepos corresponds to...

> We could throw these in at the end of the fishead packet and pump up
> the minor version number to indicate it is a new version of Skeleton,
> but it is compatible.
>    

It would be interesting to test how existing apps cope with this. Robust 
players probably ignore the skeleton, and so they'll probably just 
ignore the skeleton-enriched index, which would be ok. Robust 
`file`-type apps will probably refuse to parse the skeleton track with a 
bumped-up revision, rendering the track itself useless until they're 
updated. This would break the skeleton track in the short term, though 
I'm not sure what apps actually use the skeleton. I also note that 
neither liboggz nor liboggplay check the skeleton track's revision 
number before reading stuff out of the skeleton track, so they may 
misbehave if the skeleton track changes.

> Further, if I understand correctly, we want to apply one index per
> track, so this information is track-related:
> * The length of the indexed segment, in bytes, as an 8 byte unsigned integer.
>    

As stated above, and should be in the header.
> * The number of key points in the index, 'n', as a 4 byte unsigned integer.
> * 'n' key points, each of which contain, in the following order:
>      * the page offset as an 8 byte unsigned integer, followed by
>      * the checksum of the page found at the offset, as a 4 byte
> field,followed by
>      * the presentation times in milliseconds of the key point, as an 8
> byte unsigned integer.
>
> These would go into the fisbone packet for each track just before the
> message header fields. The "offset to message header fields" become
> dynamic, but is still compatible.
>    
The "offset to message header fields" in the fisbone was a nice idea. I 
hope all the decoders respect it. ;)

> I do wonder whether the simplicity of the index track goes away when
> we start having an index per track. If we decided to just always have
> a single header for the index track as per your current proposal we
> would need to take the information out of the fisbone and chuck it
> into the fishead. It would be possible, but not as neatly.
>    
Poorly written skeleton decoders may not handle the skeleton packets 
changing in size. A well written decoder should just ignore streams with 
unknown content - like a new index track.

The index data is logically different to the skeleton data, and it would 
be a shame to break existing robust skeleton decoders by bumping the 
version number.

All the best,
Chris P.