[vorbis] UTF8_LANG: a much better idea

Thu Jan 10 08:17:59 PST 2002

begin Glenn Maynard quotation:
> It comes down to this: mark the language of text with U+E0001 LANGUAGE
> TAG, followed by the RFC 3066 language ID (ie. "ja") encoded in
> lowercase ASCII plus 0xE0000.

When I read this, I was wondering how those "lowercase ASCII" characters
could be ignored if the marker characters weren't handled. Obviously, if
they were the "lowercase ASCII characters" U+00061 - U+0007A, it
wouldn't work, since they would be displayed as normal if U+E0000
weren't recognized; however, the standard actually says that the
characters used for the language ID part is U+E0020 - U+E007E, which are
related to the ASCII counterparts by the transform: c - 0xE0000. Thus,
they don't display as normal ASCII characters.

Another thing to mention is that this standard is part of Unicode 3.1,
which is not widely supported yet, AFAIK. Hopefully, most existing
Unicode implementations are smart enough to skip over characters in
planes which they don't implement, which would make the visual result on
non-supporting platforms the same as not using the tags at all.

-md

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.