[vorbis-dev] granulepos start/end revisited
Arc Riley
arc at xiph.org
Sun May 23 00:54:18 PDT 2004
On Sun, May 23, 2004 at 02:38:34PM +0800, illiminable wrote:
>
> In terms of user experience what you really want is all subtitles always
> shown when they are supposed to be. Neither method really solves the lost
> subtitle problem, both just take different approaches to mitigating it.
I don't believe the start-stop method does anything to solve it. It
only makes it worse, by not being able to use any subtitle directly
after seek, even if it knew about it before the seek, while adding
additional overhead. Extra overhead would be fine, of course, if it
actually improved the situation, but I can't see a single case where it
helps us in any way.
> Considering subtitle phrases are generally short, short range searches would
> be better done with a linear forward scan. In the average case you can scan
> forward a hundred pages or so, faster than the binary search can even synch
> to a frame synch point for a asingle iteration of a binary search.
Remember this isn't just for subtitles. It's possible that people will
want to use Writ for displaying metadata/etc, possibly through the
entire length of a track. Imagine a music video with the info block
shown as Writ in the lower left so it comes out nice and clean no matter
how low bitrate the video is. Such text would remain through the entire
video, and if you seek, you don't want it to disappear until it's
repeated it the bitstream. Especially since we've had the solution to
that problem for months now; by giving both start and duration from the
beginning and only dropping phrases after seek if they've expired.
> Considering that most subtitles would have a duration of less than 20
> seconds. For seeks of these short durations, a linear forward scan would be
> faster than the binary search in most cases. And if you linear scan forward
> it doesn't matter which approach you use as you see all intervening subtitle
> pages and can act on them accordingly.
We do not seek for previous/future subtitles, we only grab them as they
"fall out" of the audio/video stream. See Monty's muxing docs. To go
scanning through the stream looking for extra Writ pages would be
doable, but it's going to take more time to do. Better just to leave
the seek data up to the audio/video end and grab Writ packets as they
come "falling out" of the stream.
> For subtitles longer than 20 seconds, and assuming the fact that the missing
> subtitle problem is inevitable in some cases, realistically how time
> critical can a a subtitle of such long duration really be.
That isn't the point. The point is that if the decoder already knew
about the phrase before the seek it should not disappear after a seek
and reappear when it's repeated somewhere down the line. The only time
these "repeats" should be useful is for when the decoder seeks and
doesn't already know about it.
> Neither method solves the case where a short (important) dialogue subtitle
> packet appears just before the point you seek to... which is the worst case.
We can eat that case, tho. There's no real good way around it, and
besides, if the phrase has already appeared there's a good chance you're
jumping partway into the person speaking that sentence, and if you need
that phrase (audio or text) you'll just have to seek back a bit to get
it. There's no way around it without wasting alot of time seeking.
> But having phrase id's also adds functionality, not possible without it.
Currently we use the start-granulepos. No two phrases can start at the
same time. This is used to identify repeats; the page granulepos will
differ, while the Writ packet will specify the real start/end times.
If you know of a phrase starting at 293949, ignore this packet. If you
don't well, display this immediatly because it's already running.
> Realistically how many different phrases can be displayed at one time
> (assuming overlays)... probably not more than 2, one at the top one at the
> bottom.
Three, four, 100... :-) It's completely open. A "phrase" could be one
word, or one letter ever, being appended to the previous. Each could
have their own expiration, a middle one could even expire while the one
before and after it doesn't. "Stupid Subtitle Tricks".
> Consider the case where you have 10 tracks... the subtitles in each
> langauge. Obviously you only want to display one of these at a time, now
> without phrase/track id's each one needs it's own stream so they can be
> differentiated by ogg page id's. I would contend that the framing overhead
> of having 10 streams is much greater than having phraseids within a page.
Ok. You really need to read the Writ spec now. Really. Go read it:
http://wiki.xiph.org/OggWrit
We're not brainstorming about a subtitle format that needs to be made.
Writ exists, it's existed for months, and there's sample files in it.
Multilingual support is all in there and has been for months. I've just
been waiting for the granulepos stuff to be resolved and OggFile..
> Not necessarily true, there's no reason why a state change packet can't
> indicate begin fade. ie
> START, START FADE, END
Far more simple to add an extra layer to Writ (see the spec!), still
completely compatable with previous versions, which defines the color
transition. Tho this is something nobody needs anyways. :-)
See, I discovered when I started putting Writ together that nobody could
agree on what should be in Writ and where the line between Writ and MNG
should be drawn. Some wanted "just text", others wanted fancy colors
and even animated text for sing-along/etc *shiver*.
It took awile to get there, and alot of discussion and debate within the
#theora channel for a couple weeks. The result, as you can read, was a
format that has a very basic "foundation", the minimum nessesary for it,
then adds features in layers. First multiple languages, then placement
within a graphical window. After that, down the road, we could add
alignment within those windows, colors, and so on. Decoders need to
use, or even understand/parse, future layers to use what it wants/can.
So a "fade" type effect could consist of a "transformation" layer,
which would also supply the color-changing needed for karaoke. Prehaps
this could be merged with the color layer, or it could be the layer just
above it.. in any case, that's all for down the road.
> You are also assuming that subtitles should be "images", which is a pretty
> inefficient way to transmit them.
No, I'm assuming subtitles can be either text or images. Writ is text.
It's not just for subtitles, it's for any case where you want text in a
multimedia stream. That's why Writ 1.0 (no extra layers) and 1.1 (extra
languages) are provided as a base before text placement or any other
graphical-based features are added. It's just as useful for lyrics put
on a post-modern music jukebox's marquee than it is for subtitles and
Thats The Way It Should Be (tm).
MNG is for graphic overlays, for things that Writ simply shouldn't be
expected to do. A good example would be "pop-up" cartoon bubbles for
characters. Of course, MNG could draw the bubbles and Writ could toss
the text into a square window within the bubbles, in any language
available :-)
> In my opinion, subtitles should be text. For three main reasons.
> 1) It is a more efficient way to transmit them.
> 2) It allows search engines to use text searches.
> 3) It allows more fleixibility to the player to use different fonts or
> colours if the user so desires.
We're on the same page there, brother. I think people who use images
for subtitles are a-moral spawns of the netherworld. See the crew who
put together the DVD subtitle format for a good example of this. :-)
--- >8 ----
List archives: http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body. No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.
More information about the Vorbis-dev
mailing list