[vorbis-dev] granulepos start/end revisited

Sun May 23 00:54:18 PDT 2004

On Sun, May 23, 2004 at 02:38:34PM +0800, illiminable wrote:
> 
> In terms of user experience what you really want is all subtitles always
> shown when they are supposed to be. Neither method really solves the lost
> subtitle problem, both just take different approaches to mitigating it.

I don't believe the start-stop method does anything to solve it.  It 
only makes it worse, by not being able to use any subtitle directly 
after seek, even if it knew about it before the seek, while adding 
additional overhead.  Extra overhead would be fine, of course, if it 
actually improved the situation, but I can't see a single case where it 
helps us in any way. 

> Considering subtitle phrases are generally short, short range searches would
> be better done with a linear forward scan. In the average case you can scan
> forward a hundred pages or so, faster than the binary search can even synch
> to a frame synch point for a asingle iteration of a binary search.

Remember this isn't just for subtitles.  It's possible that people will 
want to use Writ for displaying metadata/etc, possibly through the 
entire length of a track.  Imagine a music video with the info block 
shown as Writ in the lower left so it comes out nice and clean no matter 
how low bitrate the video is.  Such text would remain through the entire 
video, and if you seek, you don't want it to disappear until it's 
repeated it the bitstream.  Especially since we've had the solution to 
that problem for months now; by giving both start and duration from the 
beginning and only dropping phrases after seek if they've expired.

> Considering that most subtitles would have a duration of less than 20
> seconds. For seeks of these short durations, a linear forward scan would be
> faster than the binary search in most cases. And if you linear scan forward
> it doesn't matter which approach you use as you see all intervening subtitle
> pages and can act on them accordingly.

We do not seek for previous/future subtitles, we only grab them as they 
"fall out" of the audio/video stream.  See Monty's muxing docs.  To go 
scanning through the stream looking for extra Writ pages would be 
doable, but it's going to take more time to do.  Better just to leave 
the seek data up to the audio/video end and grab Writ packets as they 
come "falling out" of the stream.

> For subtitles longer than 20 seconds, and assuming the fact that the missing
> subtitle problem is inevitable in some cases, realistically how time
> critical can a a subtitle of such long duration really be. 

That isn't the point.  The point is that if the decoder already knew 
about the phrase before the seek it should not disappear after a seek 
and reappear when it's repeated somewhere down the line.  The only time 
these "repeats" should be useful is for when the decoder seeks and 
doesn't already know about it.

> Neither method solves the case where a short (important) dialogue subtitle
> packet appears just before the point you seek to... which is the worst case.

We can eat that case, tho.  There's no real good way around it, and 
besides, if the phrase has already appeared there's a good chance you're 
jumping partway into the person speaking that sentence, and if you need 
that phrase (audio or text) you'll just have to seek back a bit to get 
it.  There's no way around it without wasting alot of time seeking.

> But having phrase id's also adds functionality, not possible without it.

Currently we use the start-granulepos.  No two phrases can start at the 
same time.  This is used to identify repeats; the page granulepos will 
differ, while the Writ packet will specify the real start/end times.
If you know of a phrase starting at 293949, ignore this packet.  If you 
don't well, display this immediatly because it's already running.

> Realistically how many different phrases can be displayed at one time
> (assuming overlays)... probably not more than 2, one at the top one at the
> bottom.

Three, four, 100... :-)  It's completely open.  A "phrase" could be one 
word, or one letter ever, being appended to the previous.  Each could 
have their own expiration, a middle one could even expire while the one 
before and after it doesn't.  "Stupid Subtitle Tricks".  

> Consider the case where you have 10 tracks... the subtitles in each
> langauge. Obviously you only want to display one of these at a time, now
> without phrase/track id's each one needs it's own stream so they can be
> differentiated by ogg page id's. I would contend that the framing overhead
> of having 10 streams is much greater than having phraseids within a page.

Ok.  You really need to read the Writ spec now.  Really.  Go read it:

http://wiki.xiph.org/OggWrit

We're not brainstorming about a subtitle format that needs to be made.  
Writ exists, it's existed for months, and there's sample files in it.  
Multilingual support is all in there and has been for months.  I've just 
been waiting for the granulepos stuff to be resolved and OggFile..

> Not necessarily true, there's no reason why a state change packet can't
> indicate begin fade. ie
> START, START FADE, END

Far more simple to add an extra layer to Writ (see the spec!), still 
completely compatable with previous versions, which defines the color 
transition.  Tho this is something nobody needs anyways. :-)

See, I discovered when I started putting Writ together that nobody could 
agree on what should be in Writ and where the line between Writ and MNG 
should be drawn.  Some wanted "just text", others wanted fancy colors 
and even animated text for sing-along/etc *shiver*.

It took awile to get there, and alot of discussion and debate within the 
#theora channel for a couple weeks.  The result, as you can read, was a 
format that has a very basic "foundation", the minimum nessesary for it, 
then adds features in layers.  First multiple languages, then placement 
within a graphical window.  After that, down the road, we could add 
alignment within those windows, colors, and so on.  Decoders need to 
use, or even understand/parse, future layers to use what it wants/can.

So a "fade" type effect could consist of a "transformation" layer, 
which would also supply the color-changing needed for karaoke.  Prehaps 
this could be merged with the color layer, or it could be the layer just 
above it.. in any case, that's all for down the road.

> You are also assuming that subtitles should be "images", which is a pretty
> inefficient way to transmit them.

No, I'm assuming subtitles can be either text or images.  Writ is text.  
It's not just for subtitles, it's for any case where you want text in a 
multimedia stream.  That's why Writ 1.0 (no extra layers) and 1.1 (extra 
languages) are provided as a base before text placement or any other 
graphical-based features are added.  It's just as useful for lyrics put 
on a post-modern music jukebox's marquee than it is for subtitles and 
Thats The Way It Should Be (tm). 

MNG is for graphic overlays, for things that Writ simply shouldn't be 
expected to do.  A good example would be "pop-up" cartoon bubbles for 
characters.  Of course, MNG could draw the bubbles and Writ could toss 
the text into a square window within the bubbles, in any language 
available :-)

> In my opinion, subtitles should be text. For three main reasons.
> 1) It is a more efficient way to transmit them.
> 2) It allows search engines to use text searches.
> 3) It allows more fleixibility to the player to use different fonts or
> colours if the user so desires.

We're on the same page there, brother.  I think people who use images 
for subtitles are a-moral spawns of the netherworld.  See the crew who 
put together the DVD subtitle format for a good example of this. :-)

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.