[vorbis] TAG Standard - ENSEMBLE/PERFORMER tags

Victoria E. Lease vlease at floofy-skirts.org
Fri Jan 4 07:43:45 PST 2002


[Jonathan Walther]
> Thank you for the URL's.  I guess we are back to the RFC 2047
> scheme.  I actually think its pretty elegant.  And it is very
> flexible, you can include words in dozens of different languages...

It may be worth noting that RFC2047 itself does not specify a
language for the text given; it only specifies encoding. The tags
are already UTF8, so the encoding needs no specification there,
and RFC2047ing it would make it considerably less human-readable
for any character set which is not ASCII. All that RFC2047 would
really buy us is the ability to specify that the encoding is
something other than UTF8, and I think that Mr Maynard covered the
problems of that pretty well (not to mention that I'm pretty certain
that the comment tag spec specifically states UTF8 encoding for the
comment tags...).

As far as the comment tags are concerned, users' computers will
automatically assume the display language to be their native
insofaras having a set of default codepages that they can read
loaded for the han pages. I don't think it's a bad idea to just
leave the tag information as-is, since it is supposed to be simple
and human-readable, and anything specifying information in multiple
languages' native writing systems is probably not going to be simple
or human-readable.

In XML-land, the xml:lang attribute handles language identification
just fine, so the point is moot there. Just throw an xml:lang attr
in with any field that contains data in a language affected by han
unification and away you go. Of course, the average player will
probably ignore lang attrs, rather than switching han display for
that field, but it could not be said that there is not a simple and
effective method in the ogg/vorbis data itself to specify which
characters should be displayed...

In conclusion, my recommendation, for as little as it is worth, is
to just let users specify tag/comment data in whichever language
they feel, and let them worry about making it readable on their
computer. If people decide to try to cram really complex data into
the tags, the results will be poor, and it is not through failing on
the comment-tag design's part so much as people just using the 
wrong tool for the job they are doing. On the XML metadata side, this
is not a problem, so getting the XML metadata stuff going would be a
big win as far as this is concerned.

I don't think I said anything new here, but good information bears
repeating, lest it get lost in the stream... ;)

> Too bad unicode didn't include a way to specify the character set
> as a sort of "special" character itself...

Too bad indeed... it seems silly to me to try to make a character
encoding system capable of representing any language on the earth,
and then immediately omit huge sets of some languages' characters
just because there are similar characters in languages from the same
family. Not all literate Japanese-readers can read the Chinese
versions of all of their characters, just like not all literate
English-speakers can read the ancient Phoenecian equivalents of
their characters ("but we need to represent both Phoenecian and
English in Unicode, and English writing is derived from
Phoenecian...")

That is my little rant on Unicode being annoying. Thank you for
listening. :)

-- 
   Victoria E. Lease <vlease at floofy-skirts.org>
C66F 5745 AE21 B21F 5326  FA12 DBC2 9245 9475 3F70


-------------- next part --------------
A non-text attachment was scrubbed...
Name: part
Type: application/pgp-signature
Size: 247 bytes
Desc: not available
Url : http://lists.xiph.org/pipermail/vorbis/attachments/20020104/cb114dee/part.pgp


More information about the Vorbis mailing list