[theora-dev] using Kate for WebVTT encapsulation

Sat Apr 16 15:01:00 PDT 2011

Hi,

On Sat, Apr 16, 2011 at 6:53 AM, ogg.k.ogg.k at googlemail.com
<ogg.k.ogg.k at googlemail.com> wrote:
> Hi,
>
> sorry for the delay, I haven't been checking mail for a few days.
>
>> My idea is that - because WebVTT is so similar to SRT - it would be
>> simple to support WebVTT in Ogg by encapsulating it in a Kate track
>> and making sure that it can be extracted again without loss of
>> information through the kate libraries.
>
> This would seem easily doable.
> However, since WebVTT seems to define all layout/styling, it seems like
> shoehorning it, and adding baggage that will be left unused.

What do you mean by adding baggage? Do you want to translate the
WebVTT styling to Kate styling?

> It is easy to
> ignore the decoding of the Kate styling information though, as timing and
> text were placed first to allow for simple text only decoders to work like this.

I am not fussed how the cue settings are encoded - they just have to
come out when decoding in exactly the same way, i.e. when I encode a
WebVTT file into Ogg and then extract it again, it has to be
identical.

> The category present in the header is also meant for decoders to know the
> type/purpose of the stream, which would allow a player to know whether a
> track is a WebVTT track or not.

The format would be WEBVTT, the purpose "captions" or "subtitles" etc.

>
>> There would only be a few changes necessary:
>> * WebVTT has a header which needs to be parsed and re-created.
>
> While I haven't looked at it, it doesn't sound much different from, say,
> LRC, which kateenc/katedec can import/export.

Yes, that's what I was thinking.

>> * Also, there is a suggestion for inclusion of name-value Metadata at
>> the top right after the header, which we'd want to retain (maybe in a
>> header or a first packet).
>
> The second header packets is a Vorbiscomment packet, which is used
> for the same thing.

Ah yes, that should work well.

>> * Then there are cue settings, which are position modifiers on each
>> cue (segment of timed text). They need to be recreated, too.
>>
>> The marked-up text inside cues needs to be retrieved unchanged.
>
> I don't know what these are, so I can't say offhand whether it would match.
> I'll have a look at those in the URL you quoted.

Here is an example WebVTT file extract:

WEBVTT

top
00:00:05,000 --> 00:00:08,040 A:middle L:10%
I dabble? Listen to me. What a <b>jerk</b>.

bottom
00:00:05,000 --> 00:00:08,040 A:middle L:60%
Yeah, I sort of <c.red>dabble</c> around,
you know.

"top" and "bottom" are identifier (like the numbers in SRT).

Cue setting are the things behind the start/end time. They control the
presentation location of a cue.

The markup inside cue text changes the presentation styles of cue
parts. I don't think that part is difficult to do - it just needs to
be copied plainly into the text field.

Silvia.