[theora-dev] question about ogg mapping

Thu Jun 12 09:58:51 PDT 2003

The problem is this: right now, with each frame in its own page, granulepos can be used to guess whether the frame is a keyframe.  If we put multiple frame packets in a page, there is presently no mechanism to find the first keyframe in that page to start playing from.

We could impose the following limitation, which would work in most scenarios:  if a page contains multiple frames, and there are any keyframes within that page, then the first frame in the page must be a keyframe (and granulepos will reflect this, calling the whole page a 'keyframe').

What's bothering me I guess is that we're imposing ad-hoc rules about the theora/Ogg mapping.  These sorts of things should really be codec independent.  I guess once we try to oggify another video codec, we'll see where these independent properties are and bring them out of anything Theora-specifc.  Perhaps there needs to be a 'video codec mapping' document that is independent of the Theora spec.

<p> ___  Dan Miller
(++,) Founder, On2 Technologies

<p>> -----Original Message-----
> From: Ralph Giles [mailto:giles at xiph.org]
> Sent: Thursday, June 12, 2003 11:45 AM
> To: theora-dev at xiph.org
> Subject: Re: [theora-dev] question about ogg mapping
> 
> 
> On Thursday, June 12, 2003, at 04:34 pm, Dan Miller wrote:
> 
> > right, my issue is that we're using granulepos somewhat 
> stragely right 
> > now with certain bits indicating keyframes, etc.  That 
> scheme seems to 
> > break under the scenario of >1 frame/page.
> 
> Broken in the sense that it's more complicated than you thought? This 
> is was always the plan.
> 
> If your compressed data is fixed bitrate, seeking is easy. You just 
> multiply the desired time by the rate, seek to that offset in 
> the file, 
> and start decoding again. Unfortunately, fixed bitrate codecs are 
> inefficient, which is why codecs like vorbis and theora are variable 
> bitrate.
> 
> With variable bitrate data you don't know a priori where in 
> the stream 
> a given playback time will be, so you basically have to jump a bit, 
> start decoding, see where you are, and them jump some more. In the 
> naive sense, that's fairly expensive, so people do things to try and 
> speed it up.
> 
> I believe quicktime adds a seek table after the file is 
> encoded, giving 
> offsets for various times you pick the closest one, and then hunt 
> around if you need more accuracy. Another approach is to add 
> timestamps 
> as the stream is generated. Ogg uses this method because it 
> simplifies 
> working with live streams.
> 
> So the stream is chopped up into pages, each of which has a 
> header with 
> a 'granulepos' field that acts as a timestamp. To seek you 
> guess, jump 
> to a likely point in the stream, search for the beginning on an ogg 
> page, read out the granulepos, and then you know where you are. 
> Applying this information to the next guess lets you do a binary 
> search, which is about as efficient as it gets. For rough 
> seeking, just 
> take a granulepos near the requested time. For more accurate seeking, 
> start decoding the page after the one with the greatest 
> granulepos less 
> than the seek time, and only start playback when you reach the 
> requested time. This scheme works for sample-accurate seeking in 
> vorbis, see vorbisfile.c for an example implementation.
> 
> Everything so far work applies to vorbis as well as theora. 
> There is an 
> added complication with video however. Once the decoder is 
> initialized 
> with the header packets, the vorbis decoder can playback starting at 
> any packet in the stream. In contrast, theora has a concept of 
> keyframes, which stand on their own and are distinct from the 
> majority 
> of frames, which only encode the difference to the previous keyframe. 
> Thus while our seeking scheme would get you to the right place, it 
> wouldn't decode correctly unless the frame happened to be a keyframe. 
> Thus you either have to skip ahead to the next keyframe, skip 
> *backwards* to the previous, or generate incorrect output for a while.
> 
> To get around this, monty added the 'granulepos hack' where 
> instead of 
> being the literal frame number, the granulepos is divided into two 
> parts recording the number of keyframes and the offset from the last 
> keyframe. So you just need to look for the point where that 
> offset goes 
> through zero and start decoding there. This also preserves the Ogg 
> feature of not having to peek inside the data packets to do perfect 
> seeking; everything can be done at the page level.
> 
> I guess that was kind of long, but that's my understanding of how 
> seeking works. And I don't see anything wrong with it, or a 
> better way 
> that fits with the design goals we have.
> 
>   -r
> 
> --- >8 ----
> List archives:  http://www.xiph.org/archives/
> Ogg project homepage: http://www.xiph.org/ogg/
> To unsubscribe from this list, send a message to 
> 'theora-dev-request at xiph.org'
> containing only the word 'unsubscribe' in the body.  No 
> subject is needed.
> Unsubscribe messages sent to the list will be ignored/filtered.
> 
--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'theora-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.