[theora] Indexing Ogg files for faster seeking

Thu Sep 24 05:32:55 PDT 2009

2009/9/24 Chris Double <chris.double at double.co.nz>:
> 2009/9/24 Ivo Emanuel Gonçalves <justivo at gmail.com>:
>> However, were this index idea an expansion on Skeleton, you would
>> remove the unecessary complexity of adding yet another metadata stream
>> to the mix and build on what exists.  Win-win, I say.
>
> Having it as a separate stream does allow adding of additional index
> tracks using existing tools.

You mean using things like oggz-merge? People still have to use a tool
to create the index. And they have to use a tool to create skeleton.
The skeleton information and the index are basically the same type of
information: meta data about the other tracks. So, it makes logical
sense to throw them together.

Further, I don't think putting the index into skeleton will make
skeleton parsing and index extraction more difficult. But that's just
a hunch for now, so I want to see if it all makes sense.

> It makes it relatively easy for a server
> to generate the index track on the fly, add it to an existing ogg
> file, and send that file at a clients request for example.

As  new tracks are being added, the skeleton track needs to be updated
with additional fisbone packets per new track, too. So, if we throw in
an index track per existing track, we duplicate our number of tracks
and double the number of skeleton fisbone packets. Instead, if we
throw the index in with each skeleton packet and make it such that it
is compatible with current skeleton, we make backwards compatibility
easier.

> That said, I'd be interested in seeing how you'd modify the skeleton
> specification such that it can have indexes inside it. Can it be done
> without breaking existing applications that parse skeleton streams?

Being curious about this, too, I gave it a try.

If I understand the index track specification correctly, this
information concerns the complete video file:
* The playback start time, in milliseconds, as an 8 byte unsigned
integer, this is the presentation time of the first frame.
* The playback end time, in milliseconds, as an 8 byte unsigned
integer, this is the end time of the last frame.

We could throw these in at the end of the fishead packet and pump up
the minor version number to indicate it is a new version of Skeleton,
but it is compatible.

Further, if I understand correctly, we want to apply one index per
track, so this information is track-related:
* The length of the indexed segment, in bytes, as an 8 byte unsigned integer.
* The number of key points in the index, 'n', as a 4 byte unsigned integer.
* 'n' key points, each of which contain, in the following order:
    * the page offset as an 8 byte unsigned integer, followed by
    * the checksum of the page found at the offset, as a 4 byte
field,followed by
    * the presentation times in milliseconds of the key point, as an 8
byte unsigned integer.

These would go into the fisbone packet for each track just before the
message header fields. The "offset to message header fields" become
dynamic, but is still compatible.

I do wonder whether the simplicity of the index track goes away when
we start having an index per track. If we decided to just always have
a single header for the index track as per your current proposal we
would need to take the information out of the fisbone and chuck it
into the fishead. It would be possible, but not as neatly.

Cheers,
Silvia.