No subject
Fri Aug 6 15:17:01 PDT 2004
"Content vector format
. A case-insensitive field name that may consist of ASCII 0x20 through
0x7D, 0x3D ('=') excluded. ASCII 0x41 through 0x5A inclusive (A-Z) is to
be considered equivalent to ASCII 0x61 through 0x7A inclusive (a-z).
. The field name is immediately followed by ASCII 0x3D ('='); this equals
sign is used to terminate the field name.
. 0x3D is followed by 8 bit clean UTF-8 field contents to the end of the
field."
^^^^^
This says it's UTF-8, and I think that's a very good decision. This means
we don't have to deal with DBCS encodings: disgusting, mostly obsolete
beasts. (This helps on embedded devices, too--you don't have to support
every encoding under the sun, just UTF-8. The limiting factor will
probably be fonts.)
> Here is an example of what I mean, taken from a recent message to the
> debian-devel mailing list:
>
> From: =?ks_c_5601-1987?B?x+7FuMDM?= <dkfjskd-dd at hotmail.com>
> To: debian-devel at lists.debian.org
> Subject: =?ks_c_5601-1987?B?W7GksO1dIMfuxbjAzMO1sbk=?=
>
> Here is what that showed up as in mutt:
>
> From: \307\356\305\270\300\314 <dkfjskd-dd at hotmail.com>
> To: debian-devel at lists.debian.org
> Subject: [\261\244\260\355] \307\356\305\270\300\314\303\265\261\271
>
> But in pine it some how magically showed up as Korean glyphs.
This is the old way of doing arbitrary encodings in mail. UTF-8
obsoletes it. (If you don't know about UTF-8, I strongly suggest
becoming familiar with it; http://www.cl.cam.ac.uk/~mgk25/unicode.html
is a good start.)
The main reason most people don't use UTF-8 as the default encoding in
mail is because older MUA's don't support it.
> So, since we already have an RFC approved standard (I'm assuming; I've
> been seeing these types of emails for years) for mixing foreign glyphs
> with real text, lets use it.
This RFC is for email, and it's an old, ugly way of doing things that
UTF-8 supercedes in most ways. For example, you can cat a mailbox
in which all mails have been converted to UTF-8, directly, and you see
everything as it's supposed to be seen (except for the glyph issues);
try to cat an mbox containing varying encodings and you'll get junk.
(Well, if you're not on a UTF-8 terminal you have to pipe it through
iconv; but can only do that with UTF-8 and other Unicode encodings.)
Also, if you want simple, you *don't* want MIME in the tags. UTF-8 for
everything is extremely simple (you can even ignore the "lang" tag if
you, as an implementor, don't care about the glyph problems). With
arbitrary encodings, everything gets more complicated. I think that
your own Mutt binary failing to decode it properly is a good indicator. :)
(An aside: mutt *should* be able to figure out anything pine can; you
might have a mutt without iconv or MBCS support. mutt -v should
probably list HAVE_WC_FUNCS and HAVE_ICONV.)
> For the tags themselves, they are standard, and they're staying that
> way. I'm not going to encode CONDUCTOR into Chinese. Because its a
> standard tag, the player can translate it if it wants to. And I see
> no reason why a Chinese language encoder couldn't take their equivalent
> of "conductor" and encode it as the CONDUCTOR tag in the ogg file
> itself, making it invisible to the Chinese speaking user.
No argument there; the actual tag names should be completely invariant.
They're for interpretation by a parser, not a user.
--
Glenn Maynard
--- >8 ----
List archives: http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-request at xiph.org'
containing only the word 'unsubscribe' in the body. No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.
More information about the Vorbis
mailing list