[vorbis] Unicode conversions from JA encodings

Glenn Maynard g_ogg at zewt.org
Sun Jan 13 20:15:17 PST 2002



Background link for those unfamiliar:
http://www.debian.or.jp/~kubota/unicode-symbols.html

W3's suggestions for within XML: http://www.w3.org/TR/japanese-xml/

The nice thing about this document is it gives data for a bunch of
different translation tables, and enumerates their differences.  (This
is only about Japanese encodings.  I believe similar problems may exist for
Chinese and Korean; they appear to have much fewer problems with Unicode
than Japanese, however.)

The link to Unicode's FTP is broken as Unicode obsoleted their table;
however, go down to D for XML versions.  They're still on Unicode's
site at http://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/, as
well.

We can't mandate the use of a specific table.  That's just too
burdonsome for programmers.  It's reasonable to suggest them, however.

The major example is CP932: MS's version of Shift-JIS.  Windows'
conversion functions will convert SJIS 0x5C (yen symbol) to U+005C
(backslash, aka REVERSE SOLIDUS.)  This is for compatibility with things
like filenames and C escapes; Japanese Windows systems use yen symbols
for those.

We don't have that compatibility problem; Ogg tags (and probably
metadata, too) are best off converting SJIS 0x5C to U+00A5 (Unicode's
codepoint for the yen symbol).

I'd suggest using Unicode's tables (x-sjis-unicode-0.9 and
x-eucjp-unicode-0.9.)  They're obsoleted, but not because they're not
useful.  (I believe the problem is that vendors aren't willing to change
their tables--for compatibility with their own products.)

I don't think the width problems listed are worth worrying about; those
are a system problem, not an interoperability problem, and Ogg data is
more likely to be displayed proportionally anyway (making width
irrelevant.)


-- 
Glenn Maynard

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.




More information about the Vorbis mailing list