[vorbis] TAG Standard - ENSEMBLE/PERFORMER tags

Wed Jan 9 14:14:28 PST 2002

On Wed, Jan 09, 2002 at 01:57:33PM -0500, Peter Harris wrote:
> > (For Windows, it needs to convert to and from the codepage.)
> 
> See vorbis-tools/share/utf8.c: We already convert _to_ the current code
> page. What we don't do is convert _from_ the current code page. We would,
> except Windoze translates everything into the ANSI code page before passing
> it in via argv.

Are you sure?  This is the type of behavior I used to believe Windows
did universally.  It turns out that, wherever it uses ANSI in a normal,
English install, it uses the language's encoding when set to that
language.  If you're on a Japanese system (or a system set to the Japanese
codepage), you get Shift-JIS everywhere.

I wouldn't be surprised if Windows was inconsistent and didn't do this
for commandlines.  I can't test this right now, since changing this
in Windows needs a reboot.

(This means that on an English system, inputting Japanese with an IME,
you get to play around with specific messages to catch Japanese
characters since if they're grabbed with WM_CHAR they'll get translated
to "?".  However, on a Japanese system you don't have to do anything
special at all.  This has thrown me for a few loops in the past, since
this isn't documented anywhere useful.

> What we should do is just rewrite oggenc and vorbiscomment to do everything
> in UNICODE-16. I've done it for vorbiscomment; it works. I'll be re-doing it
> during 1.0-pre (as it's a royal pain to maintain two separate
> almost-but-not-quite-identical code bases, and I don't really want to
> inflict TCHAR on other OSs. Anyone have any better ideas?).

Get main() as small as possible, write a Unicode and ANSI version of
main(), and have the Unicode version convert to UTF-8.  This could be
done with no duplication of code, ie:

int main(unicode_type *argv[], int argc)
{
        char **nargv = malloc(sizeof(char *) * ++argc);
        for(int i = 0; i < argc; ++i)
                nargv[i] = convert_to_utf8_alloc(argv[i]);
        nargv[argc] = NULL;
        return real_main(nargv, argc);
}

(fill in the blanks for the type, conversion, error checking and making
sure the main program knows that the input text is UTF-8, and also knows
that any output must be converted to the local codepage, and *not*
displayed as UTF-8--the console doesn't know about that.  This is
probably something like making it think LC_MESSAGES is the local
codepage and LC_CTYPE is UTF-8.  Of course, these are meaningless in
Windows, but the interface can be faked, as it'll already need to do
this for Unix ...)

I don't like the idea of solving this with Unicode versions, however.
That's going to screw CJK users on, for example, Japanese Win98.  (I
*would* like to see such versions exist, however.  I'm in Win2K in an
English codepage, and it would be useful to be able to use a Unicode
version for this purpose.  Most Japanese users are in the Japanese
codepage, so Unicode versions don't necessarily help them.  In practice,
I would do all of my commandline tagging in Linux, so the Unicode
version doesn't matter to me, personally, either.)

-- 
Glenn Maynard

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.