[vorbis-dev] UTF-8 & Hebrew
SyP
syp at dev-labs.com
Wed Mar 6 21:17:09 PST 2002
Hello Ross,
You wrote:
> I understood that UTF-8 utilizes as many 8-bit characters as it needs to
> store the required entended character. I think that can be up to 4
> bytes (32 bits). The old 16-bit unicode standard is apparently on it's
> way out.
Unicode isn't 16 bit, explicitly not, from at least Unicode 2.0.
The confusion is partly caused by that, for a long time, there wasn't
any characters in Unicode outside of the 0000-FFFF range.
Now, understand that UTF-8 is not Unicode, UTF-8 is a *representation*
of Unicode, using byte sequences of varying length.
> Currently if I (or you) enter an Arial #216 character (O slash,Ø) into a
> comment, it is saved as 2 bytes (C398) and displayed correctly as one
> character in my app & the Winamp comment editor, so I don't understand
> why this works but the Hebrew characters do not. Is it that UTF-8 is
> not fully supported in Windows.
Windows 95/8/ME's multibyte support, Hebrew support, CJK support is a
hack compared to the all-unicodeness which started with NT, and fully
supported by 2K and XP. And I think displaying Hebrew text isn't as
simple as displaying, say, Cyrillic, it's right-to-left, it has some
complicated system of annotation dots, etc. So I don't think that a
Cyrillic Win98 will display the Hebrew comments correctly ever, but I
may be wrong.
Cheers,
SyP
--
Can I yell "movie" in a crowded firehouse?
<p>--- >8 ----
List archives: http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body. No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.
More information about the Vorbis-dev
mailing list