[theora-dev] question about ogg mapping
Dan Miller
dan at on2.com
Thu Jun 12 09:58:51 PDT 2003
The problem is this: right now, with each frame in its own page, granulepos can be used to guess whether the frame is a keyframe. If we put multiple frame packets in a page, there is presently no mechanism to find the first keyframe in that page to start playing from.
We could impose the following limitation, which would work in most scenarios: if a page contains multiple frames, and there are any keyframes within that page, then the first frame in the page must be a keyframe (and granulepos will reflect this, calling the whole page a 'keyframe').
What's bothering me I guess is that we're imposing ad-hoc rules about the theora/Ogg mapping. These sorts of things should really be codec independent. I guess once we try to oggify another video codec, we'll see where these independent properties are and bring them out of anything Theora-specifc. Perhaps there needs to be a 'video codec mapping' document that is independent of the Theora spec.
<p> ___ Dan Miller
(++,) Founder, On2 Technologies
<p>> -----Original Message-----
> From: Ralph Giles [mailto:giles at xiph.org]
> Sent: Thursday, June 12, 2003 11:45 AM
> To: theora-dev at xiph.org
> Subject: Re: [theora-dev] question about ogg mapping
>
>
> On Thursday, June 12, 2003, at 04:34 pm, Dan Miller wrote:
>
> > right, my issue is that we're using granulepos somewhat
> stragely right
> > now with certain bits indicating keyframes, etc. That
> scheme seems to
> > break under the scenario of >1 frame/page.
>
> Broken in the sense that it's more complicated than you thought? This
> is was always the plan.
>
> If your compressed data is fixed bitrate, seeking is easy. You just
> multiply the desired time by the rate, seek to that offset in
> the file,
> and start decoding again. Unfortunately, fixed bitrate codecs are
> inefficient, which is why codecs like vorbis and theora are variable
> bitrate.
>
> With variable bitrate data you don't know a priori where in
> the stream
> a given playback time will be, so you basically have to jump a bit,
> start decoding, see where you are, and them jump some more. In the
> naive sense, that's fairly expensive, so people do things to try and
> speed it up.
>
> I believe quicktime adds a seek table after the file is
> encoded, giving
> offsets for various times you pick the closest one, and then hunt
> around if you need more accuracy. Another approach is to add
> timestamps
> as the stream is generated. Ogg uses this method because it
> simplifies
> working with live streams.
>
> So the stream is chopped up into pages, each of which has a
> header with
> a 'granulepos' field that acts as a timestamp. To seek you
> guess, jump
> to a likely point in the stream, search for the beginning on an ogg
> page, read out the granulepos, and then you know where you are.
> Applying this information to the next guess lets you do a binary
> search, which is about as efficient as it gets. For rough
> seeking, just
> take a granulepos near the requested time. For more accurate seeking,
> start decoding the page after the one with the greatest
> granulepos less
> than the seek time, and only start playback when you reach the
> requested time. This scheme works for sample-accurate seeking in
> vorbis, see vorbisfile.c for an example implementation.
>
> Everything so far work applies to vorbis as well as theora.
> There is an
> added complication with video however. Once the decoder is
> initialized
> with the header packets, the vorbis decoder can playback starting at
> any packet in the stream. In contrast, theora has a concept of
> keyframes, which stand on their own and are distinct from the
> majority
> of frames, which only encode the difference to the previous keyframe.
> Thus while our seeking scheme would get you to the right place, it
> wouldn't decode correctly unless the frame happened to be a keyframe.
> Thus you either have to skip ahead to the next keyframe, skip
> *backwards* to the previous, or generate incorrect output for a while.
>
> To get around this, monty added the 'granulepos hack' where
> instead of
> being the literal frame number, the granulepos is divided into two
> parts recording the number of keyframes and the offset from the last
> keyframe. So you just need to look for the point where that
> offset goes
> through zero and start decoding there. This also preserves the Ogg
> feature of not having to peek inside the data packets to do perfect
> seeking; everything can be done at the page level.
>
> I guess that was kind of long, but that's my understanding of how
> seeking works. And I don't see anything wrong with it, or a
> better way
> that fits with the design goals we have.
>
> -r
>
> --- >8 ----
> List archives: http://www.xiph.org/archives/
> Ogg project homepage: http://www.xiph.org/ogg/
> To unsubscribe from this list, send a message to
> 'theora-dev-request at xiph.org'
> containing only the word 'unsubscribe' in the body. No
> subject is needed.
> Unsubscribe messages sent to the list will be ignored/filtered.
>
--- >8 ----
List archives: http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'theora-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body. No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.
More information about the Theora-dev
mailing list