[theora-dev] question about ogg mapping

Thu Jun 12 09:45:04 PDT 2003

On Thursday, June 12, 2003, at 04:34 pm, Dan Miller wrote:

> right, my issue is that we're using granulepos somewhat stragely right 
> now with certain bits indicating keyframes, etc.  That scheme seems to 
> break under the scenario of >1 frame/page.

Broken in the sense that it's more complicated than you thought? This 
is was always the plan.

If your compressed data is fixed bitrate, seeking is easy. You just 
multiply the desired time by the rate, seek to that offset in the file, 
and start decoding again. Unfortunately, fixed bitrate codecs are 
inefficient, which is why codecs like vorbis and theora are variable 
bitrate.

With variable bitrate data you don't know a priori where in the stream 
a given playback time will be, so you basically have to jump a bit, 
start decoding, see where you are, and them jump some more. In the 
naive sense, that's fairly expensive, so people do things to try and 
speed it up.

I believe quicktime adds a seek table after the file is encoded, giving 
offsets for various times you pick the closest one, and then hunt 
around if you need more accuracy. Another approach is to add timestamps 
as the stream is generated. Ogg uses this method because it simplifies 
working with live streams.

So the stream is chopped up into pages, each of which has a header with 
a 'granulepos' field that acts as a timestamp. To seek you guess, jump 
to a likely point in the stream, search for the beginning on an ogg 
page, read out the granulepos, and then you know where you are. 
Applying this information to the next guess lets you do a binary 
search, which is about as efficient as it gets. For rough seeking, just 
take a granulepos near the requested time. For more accurate seeking, 
start decoding the page after the one with the greatest granulepos less 
than the seek time, and only start playback when you reach the 
requested time. This scheme works for sample-accurate seeking in 
vorbis, see vorbisfile.c for an example implementation.

Everything so far work applies to vorbis as well as theora. There is an 
added complication with video however. Once the decoder is initialized 
with the header packets, the vorbis decoder can playback starting at 
any packet in the stream. In contrast, theora has a concept of 
keyframes, which stand on their own and are distinct from the majority 
of frames, which only encode the difference to the previous keyframe. 
Thus while our seeking scheme would get you to the right place, it 
wouldn't decode correctly unless the frame happened to be a keyframe. 
Thus you either have to skip ahead to the next keyframe, skip 
*backwards* to the previous, or generate incorrect output for a while.

To get around this, monty added the 'granulepos hack' where instead of 
being the literal frame number, the granulepos is divided into two 
parts recording the number of keyframes and the offset from the last 
keyframe. So you just need to look for the point where that offset goes 
through zero and start decoding there. This also preserves the Ogg 
feature of not having to peek inside the data packets to do perfect 
seeking; everything can be done at the page level.

I guess that was kind of long, but that's my understanding of how 
seeking works. And I don't see anything wrong with it, or a better way 
that fits with the design goals we have.

  -r

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'theora-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.