[theora] Indexing Ogg files for faster seeking

Fri Jan 22 14:23:54 PST 2010

On 23/01/2010 7:07 a.m., Gregory Maxwell wrote:
> On Fri, Jan 22, 2010 at 12:47 PM, Jason Self<jason.self at gmail.com>  wrote:
>    
>> OggIndex does this, so what's the harm? If it's okay for the OggIndex 
>> to do it,
>> why not ffmpeg2theora?
>>      
> Because it doubles the storage requirement temporarily and/or creates
> a potential for corruption.
>    

To a certain degree, OggIndex does this as a safety feature, so that I 
don't hose my input files. during development. Because OggIndex is 
indexing existing Ogg files, the file has to be rewritten anyway to make 
room for the index. One day I'll get around to making OggIndex work on 
the file in place, but it would still need to rewrite it to accommodate 
the index.

> The correct thing to do is to reserve some space at the
> start, then go and fill it in at the end. [...]
> I thought ffmpeg2theora was doing this already.
>    
Yes ffmpeg2theora does this.

> Reliably, no. E.g. what happens when the input is a non-seekable
> stream (as is the case when encoding live from a camera)?
>    
ffmpeg2theora detects if a stream is seekable, and refuses to index 
seekable streams. It just will write a Skeleton 3.0 stream in that case.

> Even knowing the duration you can only roughly size the index because
> you don't know exactly how many seekpoints there will be.
>
> But thats okay, just reserve some space and make use of what you got.
>    
I still need to implement graceful dropping of keypoints...

> If you do have a more reliable source of information, you can make use
> of it, but there are lots of situations where it won't be available.
>
> I do wonder if the theora two-pass API should make mode decision
> information available, because in two pass that would give you a more
> reliable indication of the number/location of seekpoints.
>    
This is also on my list. ;)

I am happy to accept suggestions for ways to compress the index better.

All the best,
Chris P.