[vorbis] TAG Standard - ENSEMBLE/PERFORMER tags

Peter Harris peter.harris at hummingbird.com
Wed Jan 9 14:50:35 PST 2002



> > See vorbis-tools/share/utf8.c: We already convert _to_ the current code
> > page. What we don't do is convert _from_ the current code page. We
would,
> > except Windoze translates everything into the ANSI code page before
passing
> > it in via argv.
>
> Are you sure?  This is the type of behavior I used to believe Windows
> did universally.  It turns out that, wherever it uses ANSI in a normal,
> English install, it uses the language's encoding when set to that
> language.  If you're on a Japanese system (or a system set to the Japanese
> codepage), you get Shift-JIS everywhere.

Oh, ewww. It's even worse than I thought it was.

> Get main() as small as possible, write a Unicode and ANSI version of
> main(), and have the Unicode version convert to UTF-8.  This could be
> done with no duplication of code, ie:

Right now, oggenc works entirely in the local character set, and only
converts to UTF8 at the last possible instant.

Does getopt*.c already work on UTF-8 input? If that is the case, I'd argue
for vorbiscomment and oggenc to be rewritten to use all UTF8 internally.
Then it's very simple to use wmain and the UCS2->UTF8 converter for windows
systems, and main with ICONV on Unix. Both wmain and main would then call
real_main (which would, of course, expect UTF8 only).

I like the idea. It is much cleaner than what I was doing previously.

The only 'gotcha' I can see right now is error messages that quote bits of
the command line (or, even worse, file names that are passed in on the
command line). "Code Page X -> UCS2-> UTF8 -> UCS2-> Code Page X" _should_
produce identical output to the input on the command line, but I'm not sure
exactly how far I can trust MultiByteToWideChar() <-> WideCharToMultiByte()
(or a double ICONV for Unix people).

> I don't like the idea of solving this with Unicode versions, however.
> That's going to screw CJK users on, for example, Japanese Win98.

How so? All of the tags are stored as UNICODE UTF8. How is translating the
command line from (whatever) to UTF8 sooner rather than later going to screw
CJK Win98 users any more than they already are?

Peter Harris

<p>--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.



More information about the Vorbis mailing list