[vorbis] Unicode conversions from JA encodings
Glenn Maynard
g_ogg at zewt.org
Sun Jan 13 20:15:17 PST 2002
Background link for those unfamiliar:
http://www.debian.or.jp/~kubota/unicode-symbols.html
W3's suggestions for within XML: http://www.w3.org/TR/japanese-xml/
The nice thing about this document is it gives data for a bunch of
different translation tables, and enumerates their differences. (This
is only about Japanese encodings. I believe similar problems may exist for
Chinese and Korean; they appear to have much fewer problems with Unicode
than Japanese, however.)
The link to Unicode's FTP is broken as Unicode obsoleted their table;
however, go down to D for XML versions. They're still on Unicode's
site at http://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/, as
well.
We can't mandate the use of a specific table. That's just too
burdonsome for programmers. It's reasonable to suggest them, however.
The major example is CP932: MS's version of Shift-JIS. Windows'
conversion functions will convert SJIS 0x5C (yen symbol) to U+005C
(backslash, aka REVERSE SOLIDUS.) This is for compatibility with things
like filenames and C escapes; Japanese Windows systems use yen symbols
for those.
We don't have that compatibility problem; Ogg tags (and probably
metadata, too) are best off converting SJIS 0x5C to U+00A5 (Unicode's
codepoint for the yen symbol).
I'd suggest using Unicode's tables (x-sjis-unicode-0.9 and
x-eucjp-unicode-0.9.) They're obsoleted, but not because they're not
useful. (I believe the problem is that vendors aren't willing to change
their tables--for compatibility with their own products.)
I don't think the width problems listed are worth worrying about; those
are a system problem, not an interoperability problem, and Ogg data is
more likely to be displayed proportionally anyway (making width
irrelevant.)
--
Glenn Maynard
--- >8 ----
List archives: http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-request at xiph.org'
containing only the word 'unsubscribe' in the body. No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.
More information about the Vorbis
mailing list