[theora] Indexing Ogg files for faster seeking

Fri Jan 22 09:47:18 PST 2010

On Thu, Jan 21, 2010 at 3:46 PM, Chris Pearce <chris at pearce.org.nz> wrote:
> I have been experimenting with compressing the keyframe indexes.

I think you are awesome and this is GREAT!

I've been playing around with the daily builds of ffmpeg2theora and it
seems to work well overall. There's still the issue that it can't
index files where the container doesn't have a duration.

I've inserted a previous discussion with Chris. I've been meaning to
ask this question but never got around to it. This seems like as good
of a time as any.

Chris says that the guys on #theora were vehement that the output not
be rewritten to index it. My question is: What's the harm? It'd only
need to do this in cases where the duration is not known. OggIndex
does this, so what's the harm? If it's okay for the OggIndex to do it,
why not ffmpeg2theora?

On Mon, Nov 30, 2009 at 11:54 AM, Chris Pearce <chris at pearce.org.nz> wrote:
> Hi Jason,
>
> On 11/30/2009 5:53 PM, Jason Self wrote:
>>
>> If, at the end of the encode, it really does go back to the beginning to
>> write the index, why does it need to know the size/duration/whatever of the
>> input file beforehand?
>
> Because the indexes needs to be inserted near the start of the output file.
> In the one pass case, if we don't reserve space for the indexes at the start
> of the file, we need to instead move all the encoded video data in the
> output file after the indexes' insertion point up by the size of the
> indexes. This would require moving the entire encoded video data on disk.
> The guys on #theora were vehement that the entire output video must not be
> rewritten in order to index it.
> So instead, we guess how many keyframes
> there will be based on the reported duration, and from that figure out how
> much space to reserve for the indexes at their insertion point in the first
> write pass. That way we don't need to move the output video data in order to
> insert the index. But when this guess is wrong, you see the error about
> there not being enough space.
>
> We don't want the index at the end of the file (which is how WMV/ASP's
> indexing works), as that makes it less useful in the web video case.
>
>> It seems that the output file could be used for that, especially if the
>> index isn't written until the end of the encod.
>
> I assume you're talking about the output file in the one pass case? The
> requirement is to not rewrite the output file in order to fit the index in
> at the start of the file. I'm not familiar with how two pass works, but if
> the encode runs twice, you could get the duration from the first encode, and
> use that to determine the index size. So this should be possible in the two
> pass case, but not the default one pass case.
>
>> Gregory Maxwell presented an idea on the Theora mailing list.
>>
>> [1] http://lists.xiph.org/pipermail/theora/2009-November/003025.html
>>
>
> I asked Greg on the list exactly what information you can use to reliably
> guesstimate the duration, and he never responded, which I took to mean
> "d'oh, there is no way to reliably guesstimate the duration". ;)
>
> I don't have much experience with writing encoders, so if you can think of a
> way to guess the duration/index size in the one pass case, let me know. I
> think in the two pass case we could use the first pass' info.
>
>>   ffmpeg2theora could then go back and then insert whatever size index is
>> needed. The output file is essentially "complete" at that point and should
>> have all the data that is needed to determine a duration or whatever because
>> it seems that ffmpeg2theora always inserts the proper duration into the
>> file, and it seems that doing it this way would also address the issue where
>> the input container doesn't provide duration or bitrate. (It seems that you
>> can't really rely upon the input container.)
>>
>
> This is incidentally how my OggIndex program works, it reads the entire
> input file, determines the index, and then copies the entire media file
> while inserting the index at the start. So if you're indexing a 4GB file, it
> must create a 4GB copy which includes the indexes. So if ffmpeg2theora can't
> index your file during encode, you can just run OggIndex over it after
> encode to create an indexed version...
>
> I admit this is a problem, and I don't know a good solution for the one pass
> case. If you know one, I'm all ears.