[theora] Indexing Ogg files for faster seeking

Silvia Pfeiffer silviapfeiffer1 at gmail.com
Thu Jan 21 18:14:00 PST 2010


On Fri, Jan 22, 2010 at 12:46 PM, Chris Pearce <chris at pearce.org.nz> wrote:
> I have been experimenting with compressing the keyframe indexes. If I
> delta-encode the keypoint offset and timestamp fields, and then
> variable-byte encode the resulting index, the keyframe indexes compress
> to 44% of the uncompressed size.
>
> I ran some tests measuring the compression I got on 61 media files.
> These were a range of short, medium, and long theora and/or vorbis
> files, the longest being 100 hours of Bach encoded in vorbis. The
> delta-then-variable-byte-encoded indexes average at 44% of size of their
> uncompressed indexes, with a standard deviation of 1.7(%). The least
> compression I got in that sample was 48.6% of uncompressed size, and the
> best compression was 35% of uncompressed size.
>
>  From this data I conclude that if we delta-then-variable-byte-encode
> the keyframe indexes, we can pretty safely reduce the amount of space we
> pre-allocate for the indexes during encode by 50%.
>
> Using only delta-variable compression, the entire Skeleton track with
> compressed keyframe indexes for a 3 hour theora/vorbis video come to
> only 68690 bytes, which is pretty reasonable. Most videos out there will
> be smaller than that.
>
> I also tested using zlib1.2.3 to deflate
> delta-then-variable-byte-encoded indexes. The results are thrown off my
> small indexes. For the 61 media in my previous sample, the average
> zlib-deflated size of delta-then-variable-byte-encoded indexes was
> 100.29% of the delta-then-variable-byte-encoded indexes' size, with
> stdev of 17.44(%); i.e. it was bigger on average. If we filter that to
> not deflate the small indexes we mitigate the cost of the zlib deflate
> overhead:
>
> Compressing only delta-variable encoded indexes > 1,000 bytes, average
> 93.82%, stdev 3.98%
> Compressing only delta-variable encoded indexes > 2,000 bytes, average
> 93.33%, stdev 3.89%
> Compressing only delta-variable encoded indexes > 5,000 bytes, average
> 91.38%, stdev 2.55%
> Compressing only delta-variable encoded indexes > 10,000 bytes, average
> 90.55%, stdev 2.57%
> Compressing only delta-variable encoded indexes > 80,000 bytes, average
> 89.07%, stdev 0.67%
>
> The delta-variable compressed indexes which were > 80,000 bytes were
> vorbis Bach files of duration longer than 10 hours.
>
> I previously tried compressing the indexes only with zlib (i.e. not
> delta-then-variable-byte-encoding them before zlib deflating them), and
> that got us about 50% compression.
>
> Given that the zlib deflating delta-variable compressed indexes doesn't
> give much benefit for what I assume will be the common case (indexes
> less than 80,000 bytes in size), and using zlib deflate makes it harder
> to predict the amount of space to reserve for the index, and adds a new
> dependency to software which wants to read the index (zlib), I'm not
> going to use zlib compression for indexes at this time. Maybe we can add
> it into Skeleton 4...
>
> Besides, the skeleton 3.2 now includes the offset of the first
> non-header page, so if you have a large index which you want to postpone
> loading, you can always skip ahead to after the index.
>
> I'll add the delta-then-variable-byte encoding to my the skeleton spec,
> and increase the version number to 3.3. I have patches for OggIndex, and
> will produce some for ffmpeg2theora soon.

I think this is awesome! Makes total sense to me. Stick to the KISS
principle and reduce the number of library dependencies unless it
really makes sense.

Great stuff indeed!!

Cheers,
Silvia.


More information about the theora mailing list