[Theora-dev] ogg and keyframes

Wed Sep 29 16:30:48 PDT 2004

On Wed, Sep 29, 2004 at 08:31:46PM +0200, Thomas Vander Stichele wrote:

> When thinking more about this problem I was convinced that it cannot be
> guaranteed by the server to start from a keyframe.  Because if you're
> sending out an ogg stream with *two* theora streams, you can't make sure
> both of them start at a keyframe.

Well, yes you can. That's why I was curious about how your fix worked.

The two things I've thought of are:

You can do a special re-encode of everything up to the first keyframe 
for each new client that connects. Could be quite expensive, but you can 
degrade by just repeating frames if you run out of resources. Is a 
jerky/static picture with sound better than a 'buffering...' spinner 
with sound?

You can just adjust the sync between the various clients so each new 
connection starts streaming from the previous keyframe. Cheaper, but 
increases latency and server-side buffering. You're right about multiple 
theora streams in this case; I hadn't though of that. Treating it as 
'burst on connect' might work once moore's law makes the stream less 
expensive.

> Whether it should be in the lib or not is a different matter.
> Personally, I'd feel the lib should do it, since really, what is it
> decoding and then sending on to the app if it didn't have a keyframe
> yet ?

All of which is fairly convincing that regardless of what kind of job 
the server does, the client should handle this case. It needs to worry 
about many of the same issues after seek anyway.

However, having libtheora itself 'just' do this is problematic. Waiting 
for a keyframe after decoder initialization would handle the streaming 
case, but another common place you'd want this is after seek. We could 
look at the packetno passed in as part of the ogg_packet structure, and 
set the wait flag whenever there was a gap, but that feels unwise to me. 
It comes out naturally from an Ogg stream, but it may be inconvenient to 
set this meaningfully from another packet source, and having decode 
never produce and output because you didn't increment the field 
certainly violates least surprise. Also, for example, it's likely better 
to keep decoding if only one or two inter frame packets were lost to 
dropout than to blank completely until the next keyframe, but the 
playback engine is in a better position to set policy on that than the 
codec.

Which brings me to there being an explicit call to set the wait flag. 
Something like theora_decode_ctl(ctx, SET_KEYFRAME_WAIT); You'd just 
call that after decoder init, and after seek, and things would take 
care of themselves.

Having the decoder just eat non-keyframes when the wait flag is set 
is the best option for performance, but this requires keeping track of 
the flag in the playback engine as well, otherwise the recommended 
greedy algorithm with starve audio playback waiting for the first video 
frame.

I had thought of generalizing the YUV buffer into something that had a 
'type' field. The main idea would be to support some convenient 
colorspace conversions (like to chunked sRGB) to be included in the 
library without an extra filter step, but a 'null' frame type could be 
added to carry the frame skip information back to the playback engine.

Thanks for the feedback. Hope you have time for more. :)

 -r