[vorbis-dev] UTF-8 & Hebrew
Beni Cherniavksy
cben at techunix.technion.ac.il
Thu Mar 7 00:55:26 PST 2002
On 2002-03-07, SyP wrote:
> Hello Ross,
>
> You wrote:
>
> > I understood that UTF-8 utilizes as many 8-bit characters as it needs to
> > store the required entended character. I think that can be up to 4
> > bytes (32 bits). The old 16-bit unicode standard is apparently on it's
> > way out.
>
It's 6 IIRC. Unicode itself is 32 bits. To potentially represent it all
in a varable-length encoding, where some characters are shorter than 4
bytes, and if you want all bytes (except for 128 characters) to have top
bit set, and also want to easily detect character boundaries (which takes
up more bits) - it seems pretty good that it always fits into 6 bytes...
> Unicode isn't 16 bit, explicitly not, from at least Unicode 2.0.
> The confusion is partly caused by that, for a long time, there wasn't
> any characters in Unicode outside of the 0000-FFFF range.
>
> Now, understand that UTF-8 is not Unicode, UTF-8 is a *representation*
> of Unicode, using byte sequences of varying length.
>
> > Currently if I (or you) enter an Arial #216 character (O slash,Ø) into a
> > comment, it is saved as 2 bytes (C398) and displayed correctly as one
> > character in my app & the Winamp comment editor, so I don't understand
> > why this works but the Hebrew characters do not. Is it that UTF-8 is
> > not fully supported in Windows.
>
> Windows 95/8/ME's multibyte support, Hebrew support, CJK support is a
> hack compared to the all-unicodeness which started with NT, and fully
> supported by 2K and XP. And I think displaying Hebrew text isn't as
> simple as displaying, say, Cyrillic, it's right-to-left, it has some
> complicated system of annotation dots, etc. So I don't think that a
> Cyrillic Win98 will display the Hebrew comments correctly ever, but I
> may be wrong.
>
Sure. I myself never tried a Cyrrilic win98 but English ones don't have
right-to-left support and I never heard of any but Hebrew ones to have it.
Annotation dots are very seldom used (I write them once in a couple of
mothes, of all my song titles I had only one pair that needed the dots to
disambiguate...) so the undoubtful lack of support for them by non-Hebrew
windozes is not a problem.
However, the mentined user had a WinXP which is truly unicode, isn't it?
In my win2k I can see all text (Hebrew & Russian, Hebrew is my default
codepage) in Peter's plugin - when I double click the tag to edit.
However, in the list view of tags, only the current codepage is displayed!
So this is a bug with peter's plugin (I reported it once but it was
ignored). IIRC the standard tags in non-advanced mode show all right,
only the list view is problematic.
--
Beni Cherniavsky <cben at tx.technion.ac.il>
(also scben at t2 in Technion)
Common Lisp is better than Common Source and
Open Source is better than Open Collector (YMMV).
<p>--- >8 ----
List archives: http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body. No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.
More information about the Vorbis-dev
mailing list