[vorbis-dev] UTF-8 in comments

Daniel Resare noa at metamatrix.se
Thu Mar 15 07:48:49 PST 2001



Some days ago i noticed in the ogg vorbis documentation that the
comment-field contents should be encoded in UTF-8. A brilliant idea I
think, sadly it seems like no one is using it. oggenc and vorbiscomment
included in vorbis-tools writes strings to the comments fields blindly
without any encoding conversion in violation of the spec.

I intend to try to solve these problems to the best of my ability (c is not
my first language :) But before I start I thought I'd send a small list of
what I think needs to be done. Comments on the ideas are of course welcome

1) Add an option --encoding to oggenc that indicates the encoding of the
given comment fields. With this information it is possible to iconv()
the incoming strings to UTF-8 before writing them to the .ogg file. If no
--encoding is given, a reasonable default should be used (ISO-8859-1
perhaps? Is there any way to extract information from the current locale
about what character encoding to expect from user input?)

2) Modify ogg123 to convert the UTF-8 strings back to something that
displays ok (ISO-8859-1?).

3) Add sanity checking in the appropriate place of libvorbis to prevent
bogous strings to be added as comments.

4) Write a small utility that fixes the comments in existing files to
conform to spec.

issues: iconv() has some portability prblems judging from the comments in
the glibc infopage.
I don't have the knowledge about how display and input of different
charsets work on different systems, unicode xterm, fonts and so on. My
primary goal with this is to make something that works ok for
ISO-8859-[1-15] users

cheers/daniel

ps. Just to get the hang of iconv and libvorbis I wrote a small program that
checks the given vorbis files to see if invalid UTF-8 comments exist.
The program can be found on http://noa.tm/check_vorbis_comment.c
If you want to check all your ogg files, please compile and check
with 'find / -name \*.ogg -print0 |xargs -0 ./check_vorbis_comment'


-- 
nuclear cia fbi spy password code president bomb
8D97 F297 CA0D 8751 D8EB  12B6 6EA6 727F 9B8D EC2A

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.




More information about the Vorbis-dev mailing list