[theora] Liboggplay seeking artifacts

Fri Jul 24 16:56:42 PDT 2009

Yeah, that's the recommended approach for implementing seek in Ogg 
files. There's a few things you need to be aware of.

You bisect based on the granulepos of the pages in the file. A page's 
granulepos corresponds to the timestamp of the last packet that *ends* 
on that page. The frame with that timestamp doesn't necessarily *start* 
on that page. You need to start decoding on the page on which the 
keyframe *starts*. So when you stop bisecting, you need to scan 
backwards through pages until you're sure you're before the start of the 
keyframe (while previous page *on that stream* has -1 gp).

You can have other streams' pages interleaved between a keyframes' 
pages, with a page from the other stream with a gp > target_gp before 
the page which terminates the keyframe. You need to not stop in this gap 
in this case. A naive bisection would stop here (liboggz, and thus 
liboggplay, is still currently vulnerable to this).

You need to do this backwards-scan for every stream to ensure you can 
actually decode all data required to render the keyframe. But you don't 
need to do it for streams which you're not playing, and you can't for 
streams which have finished at that point in the file - but you can't 
tell which streams have finished without reading the entire stream, oh dear!

All those edge cases make it a headache to implement - though a lot 
easier now that I've told you some of the gotchas. ;)

I think my main beef against the canonical approach is that it requires 
reading backwards, which is bad for apps that read over networks, like 
Firefox, because they require new networks requests. Also doing two 
bisections is twice as slow as one. On local files you don't notice, but 
over the network you do, each bisection is a new HTTP request. That's 
why we (Mozilla) are going to change to my proposed method.

The bisection method *is* faster for things stored on local disk, or 
with large keyframe intervals, but it's harder to implement robustly 
than you think.

All the best,
Chris Pearce.

On 24/07/2009 9:13 p.m., ogg.k.ogg.k at googlemail.com wrote:
>> 1. Determine the maximum time between keyframes. This is the maximum of
>> all streams':
>> 	(max number of frames between keyframes) * (frame duration)
>> Note:
>> (number of frames between keyframes) = ((1<<  granuleshift) - 1)
>> You only really need to calculate this for streams with have a
>> granuleshift (e.g. not vorbis streams).
>>
>> 2. Seek to t - (maximum time between keyframes) (using oggplay_seek()).
>>
>> 3. Decode forward (using oggplay_step_decode()) until the frame you get
>> back has time t.
>
> The canonical method is very similar:
> 1. seek to the frame just before time you want.
> 2. From the granpos of this frame, deduct the granpos of the preceding keyframe.
> 3. Seek to there.
> 4. Decode until you reach your original target.
>
> This would work for all streams using the granshift method, whether or not
> the max time between keyframes exists or is practical.
>