[vorbis] Re: UTF8, vorbiscomment, oggenc, and 'vcedit.c'

Thu Jan 10 22:02:06 PST 2002

On Thu, Jan 10, 2002 at 07:30:07PM -0500, Peter Harris wrote:
> Non-accented characters happen to be the same in the US-ANSI code page as in
> any of the other code pages I have installed. Maybe you haven't encountered
> too many umlauts? Or maybe WM_CHAR uses the ANSI code page, but file IO on
> stdin passes characters through unchanged?

I've only done any kind of i18n with Japanese characters.  If the system codepage
is CP932, it comes through fine in the ANSI versions of Windows
interfaces; otherwise I just get a question mark.  I've only dealt with
this in windows GUIs, not the console.

> Either way, tapping umlaut-u on my German keyboard on the commandline
> becomes a superscript-n when argv is printf()ed unmodified; tapping the same
> key at a fgetc(stdin) produces an umlaut-u when the character is printf()ed
> unmodified.

What if you grab the commandline in Unicode, and use _wprintf()?
(Avoiding whatever conversion is being done to argv.)

> In vcedit.c, the only 'separate code base' part will be the
> vcedit_comment_add_tag(RANDOM_ENCODING) entry point. The Win32 one will do a
> simple UCS2->UTF8, the Unix one will call ICONV to do LC_CTYPE->UTF8. Both
> entry points then do a vorbis_comment_add_tag(UTF8_STRING).

Er, I see what you meant.  But don't forget that there are other places
this matters; MODE_LIST needs to convert, too, for example.

> a common interface without at least some pain. Many Windoze apps are TCHAR
> anyway, but I don't know what to do with command-line apps that want to be
> portable. Maybe include a FAQ in the distribution that instructs users to
> include
> 
> #ifdef WIN32
>   MultiByteToWideChar(fromARGV?CP_ACP:GetConsoleCP(), ...);
>   vcedit_comment_add_tag(UCS2Buffer);
> #else
>   vcedit_command_add_tag(OrigBuffer);
> #endif

For the sake of a simple editor, I think the easiest way is just to
leave vcedit alone (expect UTF-8), and make sure vcomment moves
everything directly to UTF-8.  (I don't see any reason to use UCS2
instead of UTF-8.  Most apps don't want to use *W-ide versions of
functions in Windows, for 9x compatibility.  If they do, it's just as
easy to convert from UCS2 to UTF-8 for the calls as it is to convert
from the codepage encoding.)

As for people writing programs to operate in the locale/codepage, I'm
not sure.  I'll think about it.

-- 
Glenn Maynard

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.