[vorbis-dev] granulepos start/end revisited

Sun May 23 01:37:07 PDT 2004

----- Original Message -----
From: "Arc Riley" <arc at xiph.org>
To: <vorbis-dev at xiph.org>
Sent: Sunday, May 23, 2004 3:54 PM
Subject: Re: [vorbis-dev] granulepos start/end revisited

<p>> On Sun, May 23, 2004 at 02:38:34PM +0800, illiminable wrote:
> > Considering subtitle phrases are generally short, short range searches
would
> > be better done with a linear forward scan. In the average case you can
scan
> > forward a hundred pages or so, faster than the binary search can even
synch
> > to a frame synch point for a asingle iteration of a binary search.
>
> Remember this isn't just for subtitles.  It's possible that people will
> want to use Writ for displaying metadata/etc, possibly through the
> entire length of a track.  Imagine a music video with the info block
> shown as Writ in the lower left so it comes out nice and clean no matter
> how low bitrate the video is.  Such text would remain through the entire
> video, and if you seek, you don't want it to disappear until it's
> repeated it the bitstream.  Especially since we've had the solution to
> that problem for months now; by giving both start and duration from the
> beginning and only dropping phrases after seek if they've expired.
>

I didn't say the other apporach solved it, i actually said i didn't think
either are good solutions to the problem. All the start and stop time
mechanism does is solve a part of the problem, it's up to interpretation as
to whether the part it solves is the most important part or whether an
approach should be found that solves the problem in all cases.

> > Considering that most subtitles would have a duration of less than 20
> > seconds. For seeks of these short durations, a linear forward scan would
be
> > faster than the binary search in most cases. And if you linear scan
forward
> > it doesn't matter which approach you use as you see all intervening
subtitle
> > pages and can act on them accordingly.
>
> We do not seek for previous/future subtitles, we only grab them as they
> "fall out" of the audio/video stream.  See Monty's muxing docs.  To go
> scanning through the stream looking for extra Writ pages would be
> doable, but it's going to take more time to do.  Better just to leave
> the seek data up to the audio/video end and grab Writ packets as they
> come "falling out" of the stream.
>

Maybe i didn't explain it properly... all i was pointing out is that for
forward seeks over short intervals, it is sometimes better to linear scan
forward, and the by product of this is that you also pass over any subtitle
pages and can act accordingly, which to me solves the (in my opinion) more
important case of short duration subtitles being missed.

> > For subtitles longer than 20 seconds, and assuming the fact that the
missing
> > subtitle problem is inevitable in some cases, realistically how time
> > critical can a a subtitle of such long duration really be.
>
> That isn't the point.  The point is that if the decoder already knew
> about the phrase before the seek it should not disappear after a seek
> and reappear when it's repeated somewhere down the line.  The only time
> these "repeats" should be useful is for when the decoder seeks and
> doesn't already know about it.
>

Yes i agree it gives you something... but IMO it still doesn't solve the
generic problem, it just introduces new mechanisms which only partially
solve it and will inevitably become legacy mechanisms that have to have
continued support if a more comprehensive solution is found.

> > Neither method solves the case where a short (important) dialogue
subtitle
> > packet appears just before the point you seek to... which is the worst
case.
>
> We can eat that case, tho.  There's no real good way around it, and
> besides, if the phrase has already appeared there's a good chance you're
> jumping partway into the person speaking that sentence, and if you need
> that phrase (audio or text) you'll just have to seek back a bit to get
> it.  There's no way around it without wasting alot of time seeking.
>

Well i did offer the suggestion that subtitles contain predecessor  location
information. Which does solve it and only adds one seek operation. To me
this is a far better solution than subtitles that only sometimes are
displayed when they are supposed to be.

> > But having phrase id's also adds functionality, not possible without it.
>
> Currently we use the start-granulepos.  No two phrases can start at the
> same time.  This is used to identify repeats; the page granulepos will
> differ, while the Writ packet will specify the real start/end times.
> If you know of a phrase starting at 293949, ignore this packet.  If you
> don't well, display this immediatly because it's already running.
>
> > Realistically how many different phrases can be displayed at one time
> > (assuming overlays)... probably not more than 2, one at the top one at
the
> > bottom.
>
> Three, four, 100... :-)  It's completely open.  A "phrase" could be one
> word, or one letter ever, being appended to the previous.  Each could
> have their own expiration, a middle one could even expire while the one
> before and after it doesn't.  "Stupid Subtitle Tricks".
>

OK... if it's comlpetely open... why use granule pos ? It seems like a way
to use a field for something that it isn't. Presumably those phrases start
at the same time... and yet they have different granulepos... lets say they
are supposed to start at 1 second... and we have 1000 granules/sec... You
have 300 tracks... the user seeks on a continuous slide bar, this means you
get different operation depending on the ordering of the tracks. Even if you
use a really high representation of granules/sec, when seeking on a
continuous spectrum, there is still the possibilty that a seek will land in
the middle of the subtitles and depending where it lands depends on which of
the subtitles *which all start at the same time* will be displayed.

What about where the *exact* metadata time is important... as a
hypothetical... say the results from a particle accelerator and recorded
physical data is in one stream, and textual "event" data which may be
generated from the physical data to annotate it is in the other. There could
be a large volume of data from hundreds of sensors that occured in a
timeframes of millionths or billionths of a second.

So there could be a million tracks ? A billion tracks ?

It also means that granule pos doesn't represent a mapping to time, unless
you introduce some weird bitshifting deal, that subtitles which are supposed
to start at the same time have different granule pos.

All these "Stupid subtitle Tricks" will all have very short durations, and
hence will become seriously deformed after a seek.

Zen.

<p>--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.