[vorbis] tags in comment field - why?

Sat Dec 29 20:07:16 PST 2001

On Thursday, December 27, 2001, at 05:01 PM, Glenn Maynard wrote:

> I said that ID3V2 fixed ID3V1's major limitations.  I'm saying that we
> shouldn't use a format that has the major limitations the current
> proposals do, as to fix the limitations the format will need to be
> completely replaced.

Hmm. Maybe I missed some of this. The short answer here is that the 
vorbis tag format was intended to be quick and informal, and is clearly 
labelled as such. As is clear from the plethora of argument, doing 
metadata properly is *hard* and a separate problem.

>> speak for Monty on the original decision but using XML is overkill.
>
> Why?  XML is simpler than defining your own data format from the ground
> up.

We've always said we'd like to have a separate metadata format, one that 
does it's best to be all things to all people. Where that belongs is in 
a separate logical ogg bistream, mixed in with the vorbis data. I think 
xml is great too, and we've had many arguments. What we really need is a 
good sane implementation of something. :)

One route I've been pointing out for the past year is to just import the 
MusicBrainz format. I think it needs some work from the design point of 
view, but there's a ready implementation of the parser and an 
established database to query.

Order switched for topicality:

> On Thu, Dec 27, 2001 at 01:56:45PM +0200, Beni Cherniavksy wrote:
>> Erh...  Good point.  That's a question to Unicode, though.  Why did 
>> they
>> do it this way?  I thought there is single glyph per unicode character 
>> but
>
> It's a matter of HAN unification; I don't understand the issue quite
> well enough to explain it, but you need to know the language of the text
> to know which font to use.

My (only moderately informed) understanding is that various languages 
use a set of idiographic characters. They've all diverged somewhat 
since, and so there are differences in the details of how some 
characters are rendered, even if they're derived from the same root. 
Analogies for roman script might be having the wrong diacritics for your 
language on a bunch of the letters, or how someone used to blackletter 
would have viewed a bible set in italic type. You can figure out what it 
says, but it hardly qualifies as proper text display. The references 
Glenn gave explain this pretty well.

>   http://www.unicode.org/unicode/faq/han_cjk.html
>   http://www.cs.ruu.nl/~otfried/Mule/unihan.html

The unicode people did things this way to try and reduce the number of 
characters in the set. (They were originally trying--foolishly--to fit 
everything into 64k) And because character variants that share a 
codepoint have similar meanings, it helps a lot with, for example, 
parsing and translation. Of course what's a stylistic difference and 
what's a distinct character is subjective, so one can find plenty of 
inconsistencies.

FWIW,
  -r

<p>--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.