[theora] Indexing Ogg files for faster seeking
Chris Pearce
chris at pearce.org.nz
Wed Oct 7 17:48:30 PDT 2009
Below is another version of the index track spec with one index packet
per stream.
The index format is still quite simple, though not as compact as the
previous "one merged index per file" approach. I estimate that if you
index two tracks, assuming one key point every two seconds from both
tracks, that in practice it will take approximately 70KB per hour of
video (11.6KB per 10 minutes) to index two-track video. That's about 20
bytes of index per second of video.
With the original "one merged index per file" approach it's about half
that, but I think the added size is an acceptable trade off. I imagine
the majority of video out there on the internet is under 10 minutes long
anyway (requiring a 12KB index...), and when playing files over a
network, most reasonable quality videos will require about 100KB/s of
bandwidth to playback smoothly. If if you've got a connection fast
enough for streaming video, you won't notice downloading an index.
You can tweak the index-keyframe interval to reduce the index size as
well, though that erodes the benefit of the index for network playback.
I've implemented this in my indexer on a new branch on my GitHub account:
http://github.com/cpearce/OggIndex/tree/index-per-stream
New spec here:
http://github.com/cpearce/OggIndex/blob/index-per-stream/IndexSpecificationVersion1.txt
Firefox builds which can handle new index format here:
https://build.mozilla.org/tryserver-builds/cpearce@mozilla.com-try-4768e6238638/
Demo here:
http://pearce.org.nz/video/indexed-seek-demo.html
New Proposed Index Track Format:
<quote>
An Ogg index track starts with an identifier header packet which
contains the following data, in the following order:
* The identifier "index\0".
* The index version format number, as a 1 byte unsigned integer. This
specification describes version 1, so this field should have the
value 0x01.
* The playback start time, in milliseconds, as an 8 byte unsigned
integer, this is the presentation time of the first frame.
* The playback end time, in milliseconds, as an 8 byte unsigned
integer, this is the end time of the last frame.
* The length of the indexed segment, in bytes, as an 8 byte unsigned
integer.
The track then contains secondary header packets, which contain the
actual indexes. These are the "index packets", and each must begin on a
new page, but they may span multiple pages. There is one index packet
for each content stream in the Ogg segment, and they appear in
increasing order of the streams' serialno. Each index packet contains
the following:
* The serialno of the stream as a 4 byte field.
* The number of key points in the index packet, 'n', as a 4 byte
unsigned integer.
* 'n' key points, each of which contain, in the following order:
- the page's byte offset as an 8 byte unsigned integer, followed by
- the checksum of the page found at the offset, as a 4 byte field,
followed by
- the presentation time in milliseconds of the key point, as an 8
byte unsigned integer.
The key points are stored in increasing order by offset. The
presentation time of the key point is calculated from the granulepos.
[...]
The last packet in the track is an empty EOS packet, which must start on
a new page.
</quote>
Note that this format can be encoded in one pass. If you know the
duration of the media, you can decide the keyframe interval (say one
every 2 seconds, which is roughly ffmpeg2theora's default for theora
anyway) and then allocate the required space in the index packets and
come back and fill it in once you've encoded the media.
Comments? Questions etc?
Chris P.
More information about the theora
mailing list