[vorbis] Re: UTF8, vorbiscomment, oggenc, and 'vcedit.c'

Peter Harris peter.harris at hummingbird.com
Fri Jan 11 15:58:28 PST 2002



> > umlaut-u in wchar_t *argv[] appears as superscript-n when wprintf()ed
>
> That seems to mean they're doing some weird conversion that we can't
> really fix.  It might be worth adding a @file option (read arguments),
> which is always in UTF-8, so that scripts that want to interface with
> this always have at least one reliable way of getting these characters
> through.  (Or perhaps in the locale/codepage encoding, I'm not sure.)

We can detect UNICODE via the magic word 'FEFF' (looks like FFFE in
little-endian files; looks like EFBBBF in UTF-8) and use that. In the
absence of the UNICODE magic word, we can default to the ANSI code page (At
least notepad seems to save files in the ACP. Which makes sense: notepad
doesn't have a console to get the console code page from. On the other hand,
what if people use a console app to edit @file? Aarrgh!)

I think notepad on NT lets you save stuff as UCS2. *checks*. Yup. On NT4,
it's a checkbox. On 2000 it's a dropdown: ANSI, UCS2L, UCS2B, or UTF8.

> > > As for people writing programs to operate in the locale/codepage, I'm
> > > not sure.  I'll think about it.
> >
> > It's not easy to solve for Windows.
>
> It's a *little* easier in GUI apps, where you don't have to deal with
> argv weirdness nor with the console codepage.

Right. Of course, GUI apps have the non-trivial problem of choosing the
right font for any given random UTF-8 string... But I digress.

Check out my patched oggenc.exe (see other message) and let me know what you
think.

Thanks,
 Peter Harris

<p>--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.



More information about the Vorbis mailing list