[vorbis] Win32 All-UTF8 oggenc.exe

Glenn Maynard g_ogg at zewt.org
Sun Jan 13 00:13:01 PST 2002



Okay; I did some research (that I should have done to begin with, sorry)
and made a lot of progress on this.

First, I'll state what you may know, that I didn't and that other
readers may not:

Windows has an ANSI codepage (GetACP()) that's used for non-Unicode
(assume non-Unicode from here in) GUI apps.  This can't be changed at
runtime, since it's system-wide.  

Each Windows console window has two codepages, the input codepage
(GetConsoleCP()) and the output codepage (GetConsoleOutputCP()), and can
be set per-console.

Everything output must be in console output codepage.

Input functions (fgets, etc.) use the console input codepage.

argv uses the ANSI codepage, not either the console codepage.

On a regular, Western Windows system, the ANSI codepage is 1252; on a
Japanese system, it's 932, etc.

On a Western Windows system, the default console codepages are 437,
which is the "OEM" codepage.  (It has line-drawing characters and such,
and is probably compatible with the old VGA font.)

I havn't tested this (it would take two long reboots), but I believe the
same is true on Japanese systems.

The console codepages are extremely limited in what they can be set to;
"chcp 1200" (Unicode, it doesn't say what encoding) or "chcp 932" fail
on an English system.  I believe you can always set it to whatever the
ANSI codepage is, however.

One problem you had: argv printing oddly.  Well, you were getting ANSI
data and printing it to something expecting OEM data.

Same thing with printing it via GetCommandLineW.

wprintf() works fine; but it's not really what we needed.  (printf() can
take Unicode arguments via %ls; the only difference is the type of the
format spec.)

GetCommandLineW is OK.  CommandLineToArgvW is not; it's NT-only.

 * end misc data *

So, we don't care about the OEM codepage.  The user wants to be able to
see (at least) what his system codepage is set to, not line-drawing
characters.  Try this:

int main(int argc, char* argv[])
{
    int orig_console_cp = GetConsoleCP();
    int orig_console_output_cp = GetConsoleOutputCP();

    if(!SetConsoleCP(GetACP()))
        printf("Couldn't set console codepage; output may not be correct.\n");
    if(!SetConsoleOutputCP(GetACP()))
        printf("Couldn't set console output codepage; output may not be correct.\n");

    printf("String: %s\n", argv[1]);
    printf("String: %ls\n", GetCommandLineW());

    /* Never, ever call GetConsoleCP() or GetConsoleOutputCP() during
     * the main body of the program; assume GetACP() is the console
     * codepage (since we just set it to that.) */
     
    /* We need to reset the codepage when we're done, or we'll leave the
     * console in our changed state.  Make sure this is done if we ^C, if
     * possible. */
    SetConsoleCP(orig_console_cp);
    SetConsoleOutputCP(orig_console_output_cp);
}

This gets rid of the console codepages completely--we operate in the
ANSI codepage, like any win32 GUI app would.  In Win32, everything we
read from the user (including argv) is now in the same codepage.  The
code is more likely to be useful for a GUI editor with no modifications.

How about this: always link in an iconvert() function, and always use it
for conversions.  On Win32, provide one that does UCS2, UTF-8 and the
ACP.

This way, almost all of utf8.c's special casing gets moved into an
iconvert with a single interface.

If translation tables are provided later for charsets that don't have
standardized ones, only the iconvert() implementations will need to be
changed.


-- 
Glenn Maynard

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.




More information about the Vorbis mailing list