archives/Re: [theora-dev] First steps towards a simple text stream format.
dvdkhlng at gmx.de
Sun Aug 10 03:33:25 PDT 2003
>>>>> "Philip" == Philip Jägenstedt <philipj at telia.com> writes:
> Hi. If we're both heading in the same direction, perhaps we should
> coordinate our efforts.
> On Sun, 10 Aug 2003 01:15:08 +0200
> <GODA-XEN at terra.es> wrote:
>> I like this and I work in a subtitle format ( I don´t have anything
>> now, its only a draft of desirables specifications ), But I decided
>> to don't use UTF8, intested, I in the work to use a type of compresed
>> utf8 in other words, this format is similar to utf8 in some way. my
>> 00000000-01111111 ->englsih characters, similar to utf8 1x... ->
>> indexed utf in a table
That looks like an ugly hack. Where do you store the table? If you
have to store a table somehwere in a header-packet, you could do a
better job by storing entropy-information in it and huffman-decoding
subtitles into UCS codes. Anyway, subtitles shouldn't pose such a
As someone who frequently works with multibyte character-sets, I just
want to emphasize how important unicode (whether UTF-8, -16, UCS or
whatever) is. People tend to forget that there are non-european
languages, and even for european languages, the chaos crated by
ISO-8859-1 up to ISO-8859-15 is extremely problematic.
PS: I didn't know that elvish is now part of unicode ;-)
GnuPG public key: http://user.cs.tu-berlin.de/~dvdkhlng/dk.gpg
Fingerprint: B17A DC95 D293 657B 4205 D016 7DEF 5323 C174 7D40
--- >8 ----
List archives: http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'theora-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body. No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.
More information about the Theora-dev