archives/Re: [theora-dev] First steps towards a simple text stream format.

Sun Aug 10 03:33:25 PDT 2003

>>>>> "Philip" == Philip Jägenstedt <philipj at telia.com> writes:

> Hi.  If we're both heading in the same direction, perhaps we should
> coordinate our efforts.

> On Sun, 10 Aug 2003 01:15:08 +0200
> <GODA-XEN at terra.es> wrote:

>> I like this and I work in a subtitle format ( I don´t have anything
>> now, its only a draft of desirables specifications ), But I decided
>> to don't use UTF8, intested, I in the work to use a type of compresed
>> utf8 in other words, this format is similar to utf8 in some way.  my
>> idea:
>> 
>> 00000000-01111111 ->englsih characters, similar to utf8 1x... ->
>> indexed utf in a table

That looks like an ugly hack.  Where do you store the table?  If you
have to store a table somehwere in a header-packet, you could do a
better job by storing entropy-information in it and huffman-decoding
subtitles into UCS codes.  Anyway, subtitles shouldn't pose such a
bitrate-problem, no?

As someone who frequently works with multibyte character-sets, I just
want to emphasize how important unicode (whether UTF-8, -16, UCS or
whatever) is.  People tend to forget that there are non-european
languages, and even for european languages, the chaos crated by
ISO-8859-1 up to ISO-8859-15 is extremely problematic.

David

PS:  I didn't know that elvish is now part of unicode ;-)

-- 
GnuPG public key: http://user.cs.tu-berlin.de/~dvdkhlng/dk.gpg
Fingerprint: B17A DC95 D293 657B 4205  D016 7DEF 5323 C174 7D40

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'theora-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.