[theora] Indexing Ogg files for faster seeking

Sun Oct 11 02:31:19 PDT 2009

On Fri, Sep 25, 2009 at 11:35 AM, Chris Pearce <chris at pearce.org.nz> wrote:
> On 25/09/09 00:32, Silvia Pfeiffer wrote:
>>
>> The skeleton information and the index are basically the same type of
>> information: meta data about the other tracks. So, it makes logical
>> sense to throw them together.
>>
>
> No, the skeleton and index hold different types of information which are
> interesting to different classes of applications. The skeleton
> duplicates the information from the first few pages of the ogg file, and
> is only useful to external applications which want metadata about the
> file without having to know how to parse the non-skeleton packets.

Presentation time and basetime are not available from the content
pages. Skeleton has been built also for the purpose of cutting out
segments from Ogg streams without having to re-encode any pages, but
with knowing at which time offset the cut-out happened. This is very
important to players.

> For
> example applications like the Unix `file`command, or the Windows file
> properties dialog. However decoding software cannot risk trusting the
> skeleton track, as the authoritative source of track metadata is the
> tracks themselves, so most players just ignore the skeleton track.
> However the index is useful to decoding applications, but it is not
> useful to `file`-type programs. Why would a file-properties dialog care
> about a keyframe index?
>
> By including the index in the skeleton, you force both `file`-type and
> player-type apps to decode stuff they don't care about.

That's already the case. And mostly requires skipping certain bytes
that are not interesting to the particular application. Not a problem
IMO.

>> Further, I don't think putting the index into skeleton will make
>> skeleton parsing and index extraction more difficult. But that's just
>> a hunch for now, so I want to see if it all makes sense.
>>
>>> It makes it relatively easy for a server
>>> to generate the index track on the fly, add it to an existing ogg
>>> file, and send that file at a clients request for example.
>>>
>> As  new tracks are being added, the skeleton track needs to be updated
>> with additional fisbone packets per new track, too. So, if we throw in
>> an index track per existing track, we duplicate our number of tracks
>> and double the number of skeleton fisbone packets. Instead, if we
>> throw the index in with each skeleton packet and make it such that it
>> is compatible with current skeleton, we make backwards compatibility
>> easier.
>>
>
> The obvious approach is to have one index track per segment, with one
> index packet per indexed-track. That would only require one extra
> fisbone packet in that case?

Yes, it would.

> There's also no benefit from having a fisbone packet for the index
> track, as it's not a "playable content" track, so most of the fields in
> the fisbone header (from memory) are not relevant. The only thing an
> index's fisbone packet would tell you is that there is an index present,
> but even that's not interesting to an external `file`-type app, as the
> presense of an index shouldn't change its seekability, just its speed at
> seeking under certain conditions.

Yes, that's actually an argument for merging the two tracks.

>>> That said, I'd be interested in seeing how you'd modify the skeleton
>>> specification such that it can have indexes inside it. Can it be done
>>> without breaking existing applications that parse skeleton streams?
>>>
>> Being curious about this, too, I gave it a try.
>>
>
> Thanks for that. I have a few comments below. I still don't like the
> idea of including the index in the skeleton track, but if it causes less
> bustage it may be worth considering.
>
>> If I understand the index track specification correctly, this
>> information concerns the complete video file:
>> * The playback start time, in milliseconds, as an 8 byte unsigned
>> integer, this is the presentation time of the first frame.
>> * The playback end time, in milliseconds, as an 8 byte unsigned
>> integer, this is the end time of the last frame.
>>
>
> The length in bytes of the indexed segment should go in the header
> packet, there's no need to duplicate it for every stream. This is the
> length of the file/segment, not the length of the track. This field
> exists so that you can immediately jump to the next segment when seeking
> if the seek target is outside of the start/end time range which is
> indexed in this segment.

What segments are you talking about? Wouldn't the complete file be indexed?

> A side question: if you have a chain consisting of two concatenated ogg
> files, the first with timestamps 5-10 minutes, and the second with
> timestamps from 20-25 minutes, would you expect to see a timeline in the
> player UI of 0-10 minutes? Or should it start at 5, and end at 25 with
> some kind of marker on the progress bar to show the time-break? There
> was some talk in #theora of making the timestamps in the index offsets
> relative to the start of the segment, rather than the time which the
> granulepos corresponds to...

I personally think it should represent the original timestamps, which
is what we created Skeleton's presentation time and basetime
parameters for. Details are here:
http://svn.annodex.net/standards/draft-pfeiffer-oggskeleton-current.txt.

>> We could throw these in at the end of the fishead packet and pump up
>> the minor version number to indicate it is a new version of Skeleton,
>> but it is compatible.
>>
>
> It would be interesting to test how existing apps cope with this. Robust
> players probably ignore the skeleton, and so they'll probably just
> ignore the skeleton-enriched index, which would be ok. Robust
> `file`-type apps will probably refuse to parse the skeleton track with a
> bumped-up revision, rendering the track itself useless until they're
> updated.

I'm not sure any 'file'-type app is actually using skeleton for
information. It wasn't really built for that. The only 'file'-type app
that we targeted was Web proxies and that hasn't happened yet. It will
come probably with the new media fragment standards of the W3C.

> This would break the skeleton track in the short term, though
> I'm not sure what apps actually use the skeleton. I also note that
> neither liboggz nor liboggplay check the skeleton track's revision
> number before reading stuff out of the skeleton track, so they may
> misbehave if the skeleton track changes.

Ah! That's a bug I would say. :-)

> The index data is logically different to the skeleton data, and it would
> be a shame to break existing robust skeleton decoders by bumping the
> version number.

I wouldn't be afraid of that. In fact, it would expose bugs, which is
a good thing. ;-)

Cheers,
Silvia.