[vorbis-dev] UTF-8 & Hebrew

SyP syp at dev-labs.com
Wed Mar 6 21:17:09 PST 2002



Hello Ross,

You wrote:

> I understood that UTF-8 utilizes as many 8-bit characters as it needs to
> store the required entended character.  I think that can be up to 4 
> bytes (32 bits).  The old 16-bit unicode standard is apparently on it's 
> way out.

Unicode isn't 16 bit, explicitly not, from at least Unicode 2.0.
The confusion is partly caused by that, for a long time, there wasn't
any characters in Unicode outside of the 0000-FFFF range.

Now, understand that UTF-8 is not Unicode, UTF-8 is a *representation*
of Unicode, using byte sequences of varying length.

> Currently if I (or you) enter an Arial #216 character (O slash,Ø) into a 
> comment, it is saved as 2 bytes (C398) and displayed correctly as one 
> character in my app & the Winamp comment editor, so I don't understand 
> why this works but the Hebrew characters do not.  Is it that UTF-8 is 
> not fully supported in Windows.

Windows 95/8/ME's multibyte support, Hebrew support, CJK support is a
hack compared to the all-unicodeness which started with NT, and fully
supported by 2K and XP. And I think displaying Hebrew text isn't as
simple as displaying, say, Cyrillic, it's right-to-left, it has some
complicated system of annotation dots, etc. So I don't think that a
Cyrillic Win98 will display the Hebrew comments correctly ever, but I
may be wrong.

Cheers,
  SyP

-- 

Can I yell "movie" in a crowded firehouse?

<p>--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.




More information about the Vorbis-dev mailing list