[vorbis] TAG Standard - ENSEMBLE/PERFORMER tags
Glenn Maynard
g_ogg at zewt.org
Wed Jan 9 14:14:28 PST 2002
On Wed, Jan 09, 2002 at 01:57:33PM -0500, Peter Harris wrote:
> > (For Windows, it needs to convert to and from the codepage.)
>
> See vorbis-tools/share/utf8.c: We already convert _to_ the current code
> page. What we don't do is convert _from_ the current code page. We would,
> except Windoze translates everything into the ANSI code page before passing
> it in via argv.
Are you sure? This is the type of behavior I used to believe Windows
did universally. It turns out that, wherever it uses ANSI in a normal,
English install, it uses the language's encoding when set to that
language. If you're on a Japanese system (or a system set to the Japanese
codepage), you get Shift-JIS everywhere.
I wouldn't be surprised if Windows was inconsistent and didn't do this
for commandlines. I can't test this right now, since changing this
in Windows needs a reboot.
(This means that on an English system, inputting Japanese with an IME,
you get to play around with specific messages to catch Japanese
characters since if they're grabbed with WM_CHAR they'll get translated
to "?". However, on a Japanese system you don't have to do anything
special at all. This has thrown me for a few loops in the past, since
this isn't documented anywhere useful.
> What we should do is just rewrite oggenc and vorbiscomment to do everything
> in UNICODE-16. I've done it for vorbiscomment; it works. I'll be re-doing it
> during 1.0-pre (as it's a royal pain to maintain two separate
> almost-but-not-quite-identical code bases, and I don't really want to
> inflict TCHAR on other OSs. Anyone have any better ideas?).
Get main() as small as possible, write a Unicode and ANSI version of
main(), and have the Unicode version convert to UTF-8. This could be
done with no duplication of code, ie:
int main(unicode_type *argv[], int argc)
{
char **nargv = malloc(sizeof(char *) * ++argc);
for(int i = 0; i < argc; ++i)
nargv[i] = convert_to_utf8_alloc(argv[i]);
nargv[argc] = NULL;
return real_main(nargv, argc);
}
(fill in the blanks for the type, conversion, error checking and making
sure the main program knows that the input text is UTF-8, and also knows
that any output must be converted to the local codepage, and *not*
displayed as UTF-8--the console doesn't know about that. This is
probably something like making it think LC_MESSAGES is the local
codepage and LC_CTYPE is UTF-8. Of course, these are meaningless in
Windows, but the interface can be faked, as it'll already need to do
this for Unix ...)
I don't like the idea of solving this with Unicode versions, however.
That's going to screw CJK users on, for example, Japanese Win98. (I
*would* like to see such versions exist, however. I'm in Win2K in an
English codepage, and it would be useful to be able to use a Unicode
version for this purpose. Most Japanese users are in the Japanese
codepage, so Unicode versions don't necessarily help them. In practice,
I would do all of my commandline tagging in Linux, so the Unicode
version doesn't matter to me, personally, either.)
--
Glenn Maynard
--- >8 ----
List archives: http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-request at xiph.org'
containing only the word 'unsubscribe' in the body. No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.
More information about the Vorbis
mailing list