[vorbis] Re: UTF8, vorbiscomment, oggenc, and 'vcedit.c'

Peter Harris peter.harris at hummingbird.com
Thu Jan 10 14:05:54 PST 2002



> Er.  You said NT the local character set is UCS2.  THat's true only at a
> low level, and unless you actively use the *W functions, the local
> character set is set by the codepage.

#define UNICODE before you #include <windows.h>, and you get *W by default
and you need to actively use *A if that's what you want.

The *A functions are all much slower on NT because they are just wrappers
for the *W functions.

> > I think it will 'just work', right up until that person tries to play
back
> > via XMMS. "Why are all my tags garbage?"
>
> Except that they're likely to write players (for Japanese users) that
> *do* work for this.  Note that there are, in fact, a large number of
> incorrectly coded MP3 tags in the wild.

Hmm. Good point.

*in the context of Windows*
> LC_CTYPE isn't a standard?  It absolutely is.  Read locale(7); note
> "CONFORMS TO".

POSIX.2. Which Windows definitely does not conform to. Not even the NTs that
supposedly have a POSIX subsystem.

> (Suggestion: read up well on locales before doing any i18n-related work in
> Unix.  Sorry, I don't have any references off-hand; I got most of my
> info in passing, code and manpages.
http://www.cl.cam.ac.uk/~mgk25/unicode.html is a good
> start, but it's Unicode-centered.)

That sounds like good advice. Thanks.

> This is a limitation, but it makes things simpler.  Output is the locale
> (*nix) or codepage (Win32), period; it's the only option.

Right.

> > I think that #3 will always only work in a subset of cases where #2 will
> > work. Where is the advantage in supporting a method that doesn't work?
>
> The advantage is that we don't require the user to do any conversions
> himself; he passes data in as a regular string.  If he really *does*
> want to do that himself, and likes dealing with UTF-8, that's fine; he
> can use #2.

Ah. I see where you were going now. I was talking specifically and only
about vorbiscomment and oggenc. You were talking about libraries and all
potential future apps written by other people.

*snip nice example*
> On Unix systems, the API defaults to the locale.
>
> On Windows systems, the API defaults to the codepage (which is the rough
> equivalent, and what most users would expect.)

On Windows systems, argv[] happens in the default ANSI code page.
fgetc(stdin) happens in the current input code page (which is _never_ the
default ANSI code page on my system, as far as I can tell). printf() happens
in the current output code page (which is usually the same as the input code
page, but doesn't need to be).

How do we know if the string came from argv[] or scanf()/getc()/etc?

Wild idea (ie. I don't know yet if I like it or not): Putting the interface
in UCS2 only (on Win32, of course. We can still be sane on *nix) forces the
programmer to think about where the string is coming from. This sounds like
a silly thing to do, but it might reduce the multi-language problems you
mention above.

> For those who want UTF-8 in the API, they simply set api_is_utf8 to
> true.  Output stays the same (it has to), and API calls assume input is
> UTF-8.  (This would work both in Windows and Unix, and be useful for
> both.)

Err... in the context of a library, output should always contain as much
information as possible. That means UTF-8. Forcing the characters into the
local character set isn't very productive if you are going to turn around
and use the information for (eg) tagging another file. It should be possible
to do that sort of operation losslessly. The library doesn't know if your
are going to dump the tag to the screen. And even if you are, the library
doesn't know if you are an X app with 16-bit UNICODE fonts installed on your
X server.

Peter Harris

<p>--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.



More information about the Vorbis mailing list