[theora] Indexing Ogg files for faster seeking
chris at pearce.org.nz
Tue Oct 13 14:39:06 PDT 2009
On 10/11/2009 10:31 PM, Silvia Pfeiffer wrote:
>> No, the skeleton and index hold different types of information which are
>> interesting to different classes of applications. The skeleton
>> duplicates the information from the first few pages of the ogg file, and
>> is only useful to external applications which want metadata about the
>> file without having to know how to parse the non-skeleton packets.
> Presentation time and basetime are not available from the content
Right, you have to decode the pages, and use the pages' granulepos to
calculate each packet's granulepos. Players already know how to do this,
that's how they calculate the presentation time of frames.
> Skeleton has been built also for the purpose of cutting out
> segments from Ogg streams without having to re-encode any pages, but
> with knowing at which time offset the cut-out happened. This is very
> important to players.
Why can't the player just get the start time of the media by decoding
the cut media? How is this case different from resuming decoding after
>> By including the index in the skeleton, you force both `file`-type and
>> player-type apps to decode stuff they don't care about.
> That's already the case. And mostly requires skipping certain bytes
> that are not interesting to the particular application. Not a problem
So skipping an index track won't be a problem the either, right? ;)
>> The length in bytes of the indexed segment should go in the header
>> packet, there's no need to duplicate it for every stream. This is the
>> length of the file/segment, not the length of the track. This field
>> exists so that you can immediately jump to the next segment when seeking
>> if the seek target is outside of the start/end time range which is
>> indexed in this segment.
> What segments are you talking about? Wouldn't the complete file be indexed?
When I refer to a segment, I mean either a stand alone file which is not
part of an ogg chain, or a part of a chained file which maps to one of
the original files concatenated together to make the chain. Just as
every segment in the chain would have its own skeleton track, every
segment in the chain requires its own index. I think you refer to these
as sequentially multiplexed bistreams in your skeleton RFC?
>> was some talk in #theora of making the timestamps in the index offsets
>> relative to the start of the segment, rather than the time which the
>> granulepos corresponds to...
> I personally think it should represent the original timestamps,
Agreed. I want to store the time value of the packets, as the decoder
would sees them in the packets themselves. So the time value stored for
the key frames is the presentation time of the frame, as represented the
decoder would see it.
>>> We could throw these in at the end of the fishead packet and pump up
>>> the minor version number to indicate it is a new version of Skeleton,
>>> but it is compatible.
How do you decide what requires a major and minor version revision
increment? I'd say including the index in the skeleton would be a major
>> It would be interesting to test how existing apps cope with this. Robust
>> players probably ignore the skeleton, and so they'll probably just
>> ignore the skeleton-enriched index, which would be ok. Robust
>> `file`-type apps will probably refuse to parse the skeleton track with a
>> bumped-up revision, rendering the track itself useless until they're
> I'm not sure any 'file'-type app is actually using skeleton for
Why would a player want to read the skeleton, when they can just read
the tracks they're decoding themselves? What data is in the skeleton
which a player would want, which they can't get from decoding the
content tracks themselves?
>> The index data is logically different to the skeleton data, and it would
>> be a shame to break existing robust skeleton decoders by bumping the
>> version number.
> I wouldn't be afraid of that. In fact, it would expose bugs, which is
> a good thing. ;-)
Ah, so if including the index in its own track exposes bugs that's ok
All the best,
More information about the theora