[theora] Indexing Ogg files for faster seeking

Sun Jan 24 16:58:23 PST 2010

On 22/01/2010 1:50 p.m., Bernard Jungen wrote:
> Have you tried delta+deflate, i.e. without variable encoding? And also
> with each kind of data grouped together?
>    
I tried grouping the data in different orders, and it only seems to make 
a difference once you get up to huge (>5 hour) media.

On 24/01/2010 4:16 p.m., Bernard Jungen wrote:
> On Sat, Jan 23, 2010 at 11:23:54AM +1300, Chris Pearce wrote:
>    
>> I am happy to accept suggestions for ways to compress the index better.
>>      
> Easy. Reduce the checksum size, possibly to zero :-)
>    
Ok, I've measured it, in my simple variable byte encoded index packets, 
the checksums (which can't be variable-byte encoded) usually take up 
about 55% of the encoded index packet. That's significant.

> Seriously, I'm starting to wonder whether the full 32-bit checksum adds much
> value to keypoints.
>
> First, the need to rebuild/reload the whole index after modifications plus
> the need for data stability while downloading speak against frequent data
> modifications.
>    
Any time the file changes the index offsets are invalidated. Once the 
index is invalidated, there's no safe way to use it. The index must be 
rebuilt if the file changes.

> Second, the test to see if there's a valid page at the seeked-to offset is
> already a good rough verification IMO.
The average page length is about 4300 bytes, so there's about a 1 in 
4300 chance that any randomly selected byte is a page start. That chance 
is large enough that I wouldn't want to solely rely on this to detect 
invalidation.

>   What could then happen if we let the
> decoder restart at a wrong page?
The basic approach to seeking is to seek to the previous key frame and 
then decode forwards to the seek target frame. So if we land way a long 
way before the seek target frame, we could spend a lot of time decoding 
forwards to the seek target. If we land after the seek target frame (or 
after the keyframe we were looking for), we have to fall back to a 
bisection search.

>   Obviously, if the decoder is minimally
> robust, bad decoded data will result, the decoder will stop, or skip to the
> next keyframe, and in the first two cases the user will probably seek again
> until the test for page start eventually fails.
>
> Third, can we imagine a useful program that could modify data on-the-fly
> without moving indexed pages nor changing the presentation times? Yes, for
> instance a program that could overwrite specific segments with e.g. *cough*
> ads. But the checksum test would reject them when seeking, although the
> other related keypoint info are still valid.
>    
I think overwriting frames seemlessly would be hard. Anything that 
changes the file size would invalidate the index. However, given that, 
if we assume that altering a file will likely change its length, then we 
can use the length-in-bytes field (which we added to the fishead packet) 
in order to detect if the file has likely changed. This is a bit 
annoying if we have a chain and we're to doing this over a network. We 
can also check that the keypoint's page offset listed in the index maps 
exactly to a page boundary as Bernard suggests above. We can also check 
that the first complete keyframe that you get when you start decoding 
after a seek matches the timestamp listed in the index. That complicates 
seeking logic just enough to be annoying, but isn't a deal breaker.

If we do that, we can drop the checksums from the index. On my set of 
test media, this reduces the index overhead to about 6.2KB per track per 
hour, about 25% of the space required by the original uncompressed index 
format (which includes the checksums).
> When looking at compressed keypoints, checksums take a huge chunk of them
> and it would be very useful to give them more thought IMO.
>
> What do you all think?
>    
I think we should remove the checksums, and simply delta and 
variable-byte encode the offsets and timestamps. We can consider an 
index invalid if:

   1. The segment/link length doesn't match the length stored in the
      fishead packet, or
   2. After a seek to a keypoint's offset, we don't land on a page
      boundary, or
   3. The first keyframe we decode after seeking to a keypoint doesn't
      have the same presentation time as stored in the index.

Does that sound reasonable?

Chris P.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.xiph.org/pipermail/theora/attachments/20100125/c8fecd27/attachment.htm