[vorbis-dev] PATCH: UTF-8 checking in libvorbis

Daniel Resare noa at metamatrix.se
Tue Mar 27 13:29:44 PST 2001


Here is a patch that implements a check so that libvorbis complains when a
comment string that is not UTF-8 is added. This patch will break oggenc
without my UTF-8 patch. (http://noa.tm/oggenc-utf8.2.diff.gz)

I've tested the verification algorithm on ~150k strings in various charsets
(from the gnome translations) and about 0.02% of the strings that has chars
with 8th bit set is valid UTF-8 data. (mainly short strings in Korean).

If someone wants to review my algorithm, the spec for UTF-8 parsing is
available at http://www.cl.cam.ac.uk/~mgk25/ucs/ISO-10646-UTF-8.html

Please note that my understanding of the inner workings of libvorbis is
quite limited and the documentation is not exactly overwhelming, so the
integration of this functionality could be done much better by someone
familiar with the code (please be my guest). I have for example used the
OV_EINVAL return value to indicate that the given vorbis_comment structure
contains invalid strings. I don't know if this is a good idea. It would
be nice to provide some cleartext error message.

The patch is against libvorbis-1.0beta4.

cheers/daniel

ps.
One thing that would be very helpful is some sort of text/howto on how to
set up an effective development enviroment with regard to shared library
and application development in c. Something that answers questions like the
following: How do i get my test programs to link with the libraries in my
libvorbis build tree instead of the ones installed on the system? How do I
compile libraries with debugging information with the autoconf/auotmake
system? How do I tell gdb where to find the sourcecode for debugging in the
library? My current setup where I link oggenc manually with an -L option to
point to /home/noa/slask/libvorbis-1.0beta4/lib/.libs/ and --static to be
sure that I don't link with the system-wide libraries (I remove
libvorbis-devel) feels a little bit sub-optimal.


-- 
nuclear cia fbi spy password code president bomb
8D97 F297 CA0D 8751 D8EB  12B6 6EA6 727F 9B8D EC2A



<HR NOSHADE>
<UL>
<LI>text/plain attachment: libvorbis-utf8-check.patch
</UL>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: libvorbis-utf8-check.patch
Type: application/octet-stream
Size: 2124 bytes
Desc: not available
Url : http://lists.xiph.org/pipermail/vorbis-dev/attachments/20010327/33a1cb28/libvorbis-utf8-check-0001.obj


More information about the Vorbis-dev mailing list