[theora] Indexing Ogg files for faster seeking
Chris Pearce
chris at pearce.org.nz
Thu Sep 24 17:35:33 PDT 2009
On 25/09/09 00:32, Silvia Pfeiffer wrote:
>
> The skeleton information and the index are basically the same type of
> information: meta data about the other tracks. So, it makes logical
> sense to throw them together.
>
No, the skeleton and index hold different types of information which are
interesting to different classes of applications. The skeleton
duplicates the information from the first few pages of the ogg file, and
is only useful to external applications which want metadata about the
file without having to know how to parse the non-skeleton packets. For
example applications like the Unix `file`command, or the Windows file
properties dialog. However decoding software cannot risk trusting the
skeleton track, as the authoritative source of track metadata is the
tracks themselves, so most players just ignore the skeleton track.
However the index is useful to decoding applications, but it is not
useful to `file`-type programs. Why would a file-properties dialog care
about a keyframe index?
By including the index in the skeleton, you force both `file`-type and
player-type apps to decode stuff they don't care about.
> Further, I don't think putting the index into skeleton will make
> skeleton parsing and index extraction more difficult. But that's just
> a hunch for now, so I want to see if it all makes sense.
>
>> It makes it relatively easy for a server
>> to generate the index track on the fly, add it to an existing ogg
>> file, and send that file at a clients request for example.
>>
> As new tracks are being added, the skeleton track needs to be updated
> with additional fisbone packets per new track, too. So, if we throw in
> an index track per existing track, we duplicate our number of tracks
> and double the number of skeleton fisbone packets. Instead, if we
> throw the index in with each skeleton packet and make it such that it
> is compatible with current skeleton, we make backwards compatibility
> easier.
>
The obvious approach is to have one index track per segment, with one
index packet per indexed-track. That would only require one extra
fisbone packet in that case?
There's also no benefit from having a fisbone packet for the index
track, as it's not a "playable content" track, so most of the fields in
the fisbone header (from memory) are not relevant. The only thing an
index's fisbone packet would tell you is that there is an index present,
but even that's not interesting to an external `file`-type app, as the
presense of an index shouldn't change its seekability, just its speed at
seeking under certain conditions.
>> That said, I'd be interested in seeing how you'd modify the skeleton
>> specification such that it can have indexes inside it. Can it be done
>> without breaking existing applications that parse skeleton streams?
>>
> Being curious about this, too, I gave it a try.
>
Thanks for that. I have a few comments below. I still don't like the
idea of including the index in the skeleton track, but if it causes less
bustage it may be worth considering.
> If I understand the index track specification correctly, this
> information concerns the complete video file:
> * The playback start time, in milliseconds, as an 8 byte unsigned
> integer, this is the presentation time of the first frame.
> * The playback end time, in milliseconds, as an 8 byte unsigned
> integer, this is the end time of the last frame.
>
The length in bytes of the indexed segment should go in the header
packet, there's no need to duplicate it for every stream. This is the
length of the file/segment, not the length of the track. This field
exists so that you can immediately jump to the next segment when seeking
if the seek target is outside of the start/end time range which is
indexed in this segment.
A side question: if you have a chain consisting of two concatenated ogg
files, the first with timestamps 5-10 minutes, and the second with
timestamps from 20-25 minutes, would you expect to see a timeline in the
player UI of 0-10 minutes? Or should it start at 5, and end at 25 with
some kind of marker on the progress bar to show the time-break? There
was some talk in #theora of making the timestamps in the index offsets
relative to the start of the segment, rather than the time which the
granulepos corresponds to...
> We could throw these in at the end of the fishead packet and pump up
> the minor version number to indicate it is a new version of Skeleton,
> but it is compatible.
>
It would be interesting to test how existing apps cope with this. Robust
players probably ignore the skeleton, and so they'll probably just
ignore the skeleton-enriched index, which would be ok. Robust
`file`-type apps will probably refuse to parse the skeleton track with a
bumped-up revision, rendering the track itself useless until they're
updated. This would break the skeleton track in the short term, though
I'm not sure what apps actually use the skeleton. I also note that
neither liboggz nor liboggplay check the skeleton track's revision
number before reading stuff out of the skeleton track, so they may
misbehave if the skeleton track changes.
> Further, if I understand correctly, we want to apply one index per
> track, so this information is track-related:
> * The length of the indexed segment, in bytes, as an 8 byte unsigned integer.
>
As stated above, and should be in the header.
> * The number of key points in the index, 'n', as a 4 byte unsigned integer.
> * 'n' key points, each of which contain, in the following order:
> * the page offset as an 8 byte unsigned integer, followed by
> * the checksum of the page found at the offset, as a 4 byte
> field,followed by
> * the presentation times in milliseconds of the key point, as an 8
> byte unsigned integer.
>
> These would go into the fisbone packet for each track just before the
> message header fields. The "offset to message header fields" become
> dynamic, but is still compatible.
>
The "offset to message header fields" in the fisbone was a nice idea. I
hope all the decoders respect it. ;)
> I do wonder whether the simplicity of the index track goes away when
> we start having an index per track. If we decided to just always have
> a single header for the index track as per your current proposal we
> would need to take the information out of the fisbone and chuck it
> into the fishead. It would be possible, but not as neatly.
>
Poorly written skeleton decoders may not handle the skeleton packets
changing in size. A well written decoder should just ignore streams with
unknown content - like a new index track.
The index data is logically different to the skeleton data, and it would
be a shame to break existing robust skeleton decoders by bumping the
version number.
All the best,
Chris P.
More information about the theora
mailing list