[Theora-dev] Resyncing cut streams...

Wed Jan 5 22:09:09 PST 2005

I'm interested to know of any player, which can properly resync after 
receiving a theora+vorbis stream that has been cut somewhere in the 
middle... ie it doesn't start at granule 0.

The problem being... when streams are cut in this way, the start time, and 
the times of the first few packets in each stream are unknown. When working 
at a packet level... the first few packets will arrive with timestamps 
of -1, it's only once the last packet in the first page arrives that you 
have any idea the relative offsets of the streams (if you are lucky), or 
even when either of them are supposed to commence playback, the times you 
should have assigned to the packets that have already passed.

Eg... suppose a stream starts (after headers)

Page Vorbis (time equiv gran pos = 50 secs)
Packet 1
Packet 2
....
Packet 15

Page Theora (time equiv gran pos = 51 secs)
Packet 1
Packet 2
..
Packet 6

Lets assume we have 5 fps video and each vorbis packet is 0.04 secs long.

So this means that the vorbis page starts at time equiv of 49.4 secs and the 
theora page at time equiv of  49.8, but this is unknown to a player.

OK... so my question is, how do you know when to start playing back each 
stream, and at what time the first 14 vorbis packets and the first 5 theora 
frames are supposed to be presented ?

The options i see are :

a) Just start playing everything now and hope for the best :
  ie. we are 0.4 seconds out of sync

b) Just drop those initial packets :
  ie we lose data, and we still can't assign a presentation time to 
anything, since we have no timebase to work from. Or you could resync by 
making all the codecs talk to each other... which is pretty dodgy.

c) VLC Solution... jsut start playing now and hopefully we'll fix it up 
later.
  ie just start playing now, and hope we can resample everything by trial 
and error later so it resyncs.

d) The big buffer solution.
  ie Buffer everything up letting all the renderers run dry, and hope we 
eventually get something that has a time stamp (which is not gauranteed) and 
then jam all the collected data (ie 1+ seconds of raw audio and video) out 
in one big chunk.

All of which are poor solutions. The other issue is, how are we supposed to 
tell the difference between a file of this kind that really does intend for 
there to be a wait time at the start (ie for resyncing, where the video may 
really not be intended to start for some amount of time to compensate for 
the different granularitys of the audio and video), and one which looks 
idenical but expects us to treat the start as really being the start 
regardless of the granule pos.

I'm starting to think that there is nothing that actually can properly 
resync... but all of them are just "close enough" as to be acceptable.

Cheers,

Zen.