[theora-dev] question about ogg mapping
Ralph Giles
giles at xiph.org
Thu Jun 12 09:45:04 PDT 2003
On Thursday, June 12, 2003, at 04:34 pm, Dan Miller wrote:
> right, my issue is that we're using granulepos somewhat stragely right
> now with certain bits indicating keyframes, etc. That scheme seems to
> break under the scenario of >1 frame/page.
Broken in the sense that it's more complicated than you thought? This
is was always the plan.
If your compressed data is fixed bitrate, seeking is easy. You just
multiply the desired time by the rate, seek to that offset in the file,
and start decoding again. Unfortunately, fixed bitrate codecs are
inefficient, which is why codecs like vorbis and theora are variable
bitrate.
With variable bitrate data you don't know a priori where in the stream
a given playback time will be, so you basically have to jump a bit,
start decoding, see where you are, and them jump some more. In the
naive sense, that's fairly expensive, so people do things to try and
speed it up.
I believe quicktime adds a seek table after the file is encoded, giving
offsets for various times you pick the closest one, and then hunt
around if you need more accuracy. Another approach is to add timestamps
as the stream is generated. Ogg uses this method because it simplifies
working with live streams.
So the stream is chopped up into pages, each of which has a header with
a 'granulepos' field that acts as a timestamp. To seek you guess, jump
to a likely point in the stream, search for the beginning on an ogg
page, read out the granulepos, and then you know where you are.
Applying this information to the next guess lets you do a binary
search, which is about as efficient as it gets. For rough seeking, just
take a granulepos near the requested time. For more accurate seeking,
start decoding the page after the one with the greatest granulepos less
than the seek time, and only start playback when you reach the
requested time. This scheme works for sample-accurate seeking in
vorbis, see vorbisfile.c for an example implementation.
Everything so far work applies to vorbis as well as theora. There is an
added complication with video however. Once the decoder is initialized
with the header packets, the vorbis decoder can playback starting at
any packet in the stream. In contrast, theora has a concept of
keyframes, which stand on their own and are distinct from the majority
of frames, which only encode the difference to the previous keyframe.
Thus while our seeking scheme would get you to the right place, it
wouldn't decode correctly unless the frame happened to be a keyframe.
Thus you either have to skip ahead to the next keyframe, skip
*backwards* to the previous, or generate incorrect output for a while.
To get around this, monty added the 'granulepos hack' where instead of
being the literal frame number, the granulepos is divided into two
parts recording the number of keyframes and the offset from the last
keyframe. So you just need to look for the point where that offset goes
through zero and start decoding there. This also preserves the Ogg
feature of not having to peek inside the data packets to do perfect
seeking; everything can be done at the page level.
I guess that was kind of long, but that's my understanding of how
seeking works. And I don't see anything wrong with it, or a better way
that fits with the design goals we have.
-r
--- >8 ----
List archives: http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'theora-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body. No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.
More information about the Theora-dev
mailing list