[theora] Indexing Ogg files for faster seeking

Tue Oct 13 16:04:59 PDT 2009

Hi Chris,

On Wed, Oct 14, 2009 at 8:39 AM, Chris Pearce <chris at pearce.org.nz> wrote:
> On 10/11/2009 10:31 PM, Silvia Pfeiffer wrote:
>>>
>>> No, the skeleton and index hold different types of information which are
>>> interesting to different classes of applications. The skeleton
>>> duplicates the information from the first few pages of the ogg file, and
>>> is only useful to external applications which want metadata about the
>>> file without having to know how to parse the non-skeleton packets.
>>
>> Presentation time and basetime are not available from the content
>> pages.
>
> Right, you have to decode the pages, and use the pages' granulepos to
> calculate each packet's granulepos. Players already know how to do this,
> that's how they calculate the presentation time of frames.

Even then you don't always have them. The basetime is the basetime of
the first packet. In a file that has been cut out of a larger file and
is a subpart, you don't know what the starting time is - it could be 2
hours  21 min and not 0. This is what basetime was created for.
Presentation time otoh tells you how much away from basetime you
should be parsing before starting to present it. In the normal file
case, basetime is 0 and the granulepos tells you the presentation
time. But granulepos doesn't let you reliably map to a time offset.

>> Skeleton has been built also for the purpose of cutting out
>> segments from Ogg streams without having to re-encode any pages, but
>> with knowing at which time offset the cut-out happened. This is very
>> important to players.
>
> Why can't the player just get the start time of the media by decoding the
> cut media? How is this case different from resuming decoding after seeking?

Because your decoded time may not be the time that you want to have displayed.

>>> By including the index in the skeleton, you force both `file`-type and
>>> player-type apps to decode stuff they don't care about.
>>
>> That's already the case. And mostly requires skipping certain bytes
>> that are not interesting to the particular application. Not a problem
>> IMO.
>
> So skipping an index track won't be a problem the either, right? ;)

No, but it creates all the overhead as stated earlier and I don't see
why that should be necessary.

>>> The length in bytes of the indexed segment should go in the header
>>> packet, there's no need to duplicate it for every stream. This is the
>>> length of the file/segment, not the length of the track. This field
>>> exists so that you can immediately jump to the next segment when seeking
>>> if the seek target is outside of the start/end time range which is
>>> indexed in this segment.
>>
>> What segments are you talking about? Wouldn't the complete file be
>> indexed?
>
> When I refer to a segment, I mean either a stand alone file which is not
> part of an ogg chain, or a part of a chained file which maps to one of the
> original files concatenated together to make the chain. Just as every
> segment in the chain would have its own skeleton track, every segment in the
> chain requires its own index. I think you refer to these as sequentially
> multiplexed bistreams in your skeleton RFC?

So, can the "length" be calculated as length=playbackend -
playbackstart ? Just wondering if we are duplicating information here.

>>>> We could throw these in at the end of the fishead packet and pump up
>>>> the minor version number to indicate it is a new version of Skeleton,
>>>> but it is compatible.
>
> How do you decide what requires a major and minor version revision
> increment? I'd say including the index in the skeleton would be a major
> change...

A minor change means its backwards compatible. A major change is not
backwards compatible. So, a player that supported a previous version
of Skeleton will still get all its information from the new version if
only the minor number changes - even if it does not support the new
functionality. Skeleton has been designed that way.

>>> It would be interesting to test how existing apps cope with this. Robust
>>> players probably ignore the skeleton, and so they'll probably just
>>> ignore the skeleton-enriched index, which would be ok. Robust
>>> `file`-type apps will probably refuse to parse the skeleton track with a
>>> bumped-up revision, rendering the track itself useless until they're
>>> updated.
>>
>> I'm not sure any 'file'-type app is actually using skeleton for
>> information.
>
> Why would a player want to read the skeleton, when they can just read the
> tracks they're decoding themselves? What data is in the skeleton which a
> player would want, which they can't get from decoding the content tracks
> themselves?

Basetime and presentation time as described above. Also, further
metadata. For dealing with mash-ups of media fragments as being
standardised in the W3C Media Fragment Working Group, these are
actually really important to be able to display time mappings
correctly.

>>> The index data is logically different to the skeleton data, and it would
>>> be a shame to break existing robust skeleton decoders by bumping the
>>> version number.
>>
>> I wouldn't be afraid of that. In fact, it would expose bugs, which is
>> a good thing. ;-)
>
> Ah, so if including the index in its own track exposes bugs that's ok too?
> ;)

So, a skeleton decoder that does not respect a minor version number
change is a bug in the skeleton parsing, which should be fixed in view
of making further additions to skeleton in future. A decoder that
breaks because it does not tolerate an unknown track is a more
fundamental problem, which may also happen to decoders that do not
support skeleton. These are orthogonal bugs. Any decoder that supports
skeleton should not break on an additional Index track.

I'd much prefer having more motivation to implement support for a
single meta track that includes more useful information for everyone
than having several decoders, which support one or the other or none.
I think it creates a larger mess if we add yet another meta track
without a real need for making it an extra track.

Regards,
Silvia.