[vorbis] Re: UTF8, vorbiscomment, oggenc, and 'vcedit.c'

Peter Harris peter.harris at hummingbird.com
Thu Jan 10 16:30:07 PST 2002



> Yay, now *I'm* confused.  I've only ever seen Windows non-UCS2 strings
> encoded in what GetACP() says.

Non-accented characters happen to be the same in the US-ANSI code page as in
any of the other code pages I have installed. Maybe you haven't encountered
too many umlauts? Or maybe WM_CHAR uses the ANSI code page, but file IO on
stdin passes characters through unchanged?

Either way, tapping umlaut-u on my German keyboard on the commandline
becomes a superscript-n when argv is printf()ed unmodified; tapping the same
key at a fgetc(stdin) produces an umlaut-u when the character is printf()ed
unmodified.

> > Wild idea (ie. I don't know yet if I like it or not): Putting the
interface
> > in UCS2 only (on Win32, of course. We can still be sane on *nix) forces
the
> > programmer to think about where the string is coming from. This sounds
like
> > a silly thing to do, but it might reduce the multi-language problems you
> > mention above.
>
> That means duplicating the whole codebase, or infesting the whole thing
> with LCHAR, or whatever Windows calls its sometimes-char-sometimes-wchar
> data type.  Neither is at all attractive.

In vcedit.c, the only 'separate code base' part will be the
vcedit_comment_add_tag(RANDOM_ENCODING) entry point. The Win32 one will do a
simple UCS2->UTF8, the Unix one will call ICONV to do LC_CTYPE->UTF8. Both
entry points then do a vorbis_comment_add_tag(UTF8_STRING).

Unless you are talking about apps that will be using vcedit. That _is_ a
royal pain. I can't think of a good way to avoid the pain, either. Windoze
is simply too broken (or at least too different from Unix) to be able to use
a common interface without at least some pain. Many Windoze apps are TCHAR
anyway, but I don't know what to do with command-line apps that want to be
portable. Maybe include a FAQ in the distribution that instructs users to
include

#ifdef WIN32
  MultiByteToWideChar(fromARGV?CP_ACP:GetConsoleCP(), ...);
  vcedit_comment_add_tag(UCS2Buffer);
#else
  vcedit_command_add_tag(OrigBuffer);
#endif

? Unless somebody can think of a better idea, that is. (PLEASE somebody
think of a better idea...)

Peter Harris

<p>--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.



More information about the Vorbis mailing list