[vorbis] Comment field spec needs to be expanded and tightened

David Wheeler dwheeler at ida.org
Sat Mar 3 19:36:43 PST 2001



Hello!  I think the "Ogg Vorbis comment field specification"
(http://www.xiph.org/ogg/vorbis/doc/v-comment.html)
should specify many more fields and define them more exactly. That way,
everyone will agree on their meanings and computers can process them.
Three of the current field names are very misleading; I think they should be
changed, ASAP.  In particular, a few more tags would help support commercial
use and staying within the law (e.g., "copy all tracks that can be freely
redistributed"); I think that would speed acceptance of Ogg Vorbis.

Putting this data in an XML format would be fine, but then an XML format
needs to be defined.  I've defined this proposal so that, if XML is later to
be used directly, the transition is easy.

First, the misleading names.
Change "ORGANIZATION" to "PUBLISHER"; many other organizations
may be involved, and not all publishers are organizations.
Note that MP3's related spec already calls this value "publisher".  I suggest
changing "DATE" to "RECORDING_DATE" and "LOCATION" to "RECORDING_LOCATION";
there are many other potential dates and locations someone might store.

Next, I suggest defining the following additional fields.
I've grouped them into three categories: "people",
"general information", and "legal issues"; you could place the existing
list under the category "basic information".

* for people:

COMPOSER
 Track composer of the music.

AUTHOR
 Track author of the words/text (the lyricist, if this is music).

ARRANGER
 Arranger of the music (if different than composer).

PRODUCER
 Track producer; this is often different from the PUBLISHER.

CONDUCTOR
 Conductor.

ACCOMPANIMENT
 The band, orchestra, or other accompaniment (may be a group), e.g.,
 "E Street Band". 

ACCOMPANIMENT_ARTIST
 The name of an accompanist, may include parenthetically the
 instruments/vocal roles played.
 E.G., "Iman Example (Bass Guitar, Fiddle, Alto)".

STATION
 Station from which this was streamed (radio, Internet radio, etc.).

INTERVIEWEE
 If the material is an interview, who is being interviewed?

INTERVIEWER
 If the material is an interview, who is doing the interviewing?

* for general information:

LENGTH
 Total playing time in seconds, m:ss, or h:mm:ss.
 The seconds may be a floating point value (i.e., have a decimal point),
 and if there's no colon the value in seconds may be any nonnegative number.
 Examples are "7:23", "3600", and "75:40:45.3". The value "0" is legal.

SUBTITLE
 Subtitle.

LANGUAGE
 The language(s) used in the recording (comma-separated if more than one).

CD_ID
 The table of contents (TOC) frame from the CD, uniquely identifying the CD.
 When combined with TRACKNUMBER this forms a unique id for the track.
 This allows use of databases such as "CDDB" and "Freedb" (freedb.freedb.org).
 This is a 4-byte header, 8 bytes/track, plus 8 bytes "lead out",
 from the table of contents (TOC) header of the CD.
 Its maximum size is 804 bytes (99 songs), but normally it's much smaller
 (a 10 song CD uses 92 bytes).
 The Vorbis specification can handle binary data, but since it would be
 unusual to do this, this should be represented using 2 hexadecimal
 digits/byte (this doubles a 10-song CD ID to 184 bytes of data).
 {It could be straight binary, or uuencoded, or whatever; any druthers?}

DEMO
 If set, this indicates this is only a "demo" of a "real" track
 (e.g., it only has part of the track). The text describes in what way
 it's a demo (e.g., "First XYZ minutes").
  See TRACK-URL, ALBUM-URL, or PURCHASE-URL to get the real thing.

* for legal issues:

REDISTRIBUTION
 Text describing redistribution conditions. May be "never", "unlimited",
 "[non-commercial | private | restricted ] redistribution
  [in {list of countries using ISO 639 names}]", or some other text.
 For commercial music, "never" would be common. The value
 "non-commercial redistribution" is encouraged.

USAGE
 Text indicating usage (replaying) conditions. May be "never", "unlimited",
 "[non-commercial | commercial ] [private | public | public performance ] use
  [in {list of countries using ISO 639 names}] |
  [at {location}]   [only]", or some other text.
 For commercial music "private use only" would a common value;
 "non-commercial use" would certainly be encouraged.
 Note that ``fair use'' laws grant additional usage privileges beyond this.
 If the use terms are unusual, it's best to define a USAGE-URL.
 For rights of modification or extraction, see LICENSE.

LICENSE
 The license used.  May be a license such as "Free Music License",
 "Public Domain", or a generic "All rights reserved"; see LICENSE-URL
 (if defined) for its text.

LICENSEE
 Licensee of the file contents.  Sometimes this person is called the ``owner'',
 but the real owner of the track is specified in the COPYRIGHT information.

LYRICS_COPYRIGHT
 The copyright on the text/lyrics, same format as COPYRIGHT;
 this may be "Public Domain".

MUSIC_COPYRIGHT
 The copyright on the music, same format as COPYRIGHT;
 this may be "Public Domain".

PURCHASE-URL
 Open this URL to begin purchasing this track (in many cases this will
 involve purchasing the entire album).  Going to this URL must not
 "instantly" purchase it; users must be allowed to confirm their
 purchase information.

* General conventions:

For a field with a URL (URI), just append "-URL" to the fieldname.
So, ALBUM-URL is the URL for the album, and TITLE-URL is the URL
for this particular (titled) work (that URL would discuss the title and may
include a link to a file containing its content). Other useful fields include
ARTIST-URL, PUBLISHER-URL, STATION-URL, and LICENSE-URL.  If translated to
XML, this data should be the "url" attribute of the corresponding tag;
if translated to HTML/XHTML, use '<a href="..">'.
Indeed, I suggest using "-" as a general indicator that this is an attribute
of a tag if converted to XML (to represent a space, use "_" instead).

If field data has different values for different languages,
append "[", language name, and "]" to the field name.
This is particularly useful for DESCRIPTION; DESCRIPTION[de]
would have a German description.
Don't translate an album or title to multiple languages unless it
is typically known by those translations (i.e., if the translations
are normally printed on the album cover in a printed version).

================== FIELD SPECIFICATIONS =================================

(Modify COPYRIGHT to say): This field should have a date, a space,
and then the name of the copyright owner.
This is the copyright of the particular performance, not the copyright
of the music or text/lyrics.
Note that some copyrighted works may be freely distributed or freely used,
and that ``fair use'' laws allow certain uses even when the copyright
owner has not permitted such uses.  If it's in the public domain, instead use
specify "Public Domain" for the "LICENSE" value.

(Modify GENRE to say this - you might just copy in the list):
For GENRE, where possible, use the full text name of the genre as
specified in the MP3 ID3v2 specification, appendix A
(http://www.id3.org/develop.html).

All dates/times must be specified using a subset of ISO 8601
(this is also true for ID3v2, see below). The format of a time string is
yyy-MM-ddTHH:mm:ss
(year, "-", month, "-", day, "T", hour (out of 24), ":",
minutes, ":", seconds), and the precision may be reduced by
removing as many time indicators as wanted from the end.
All time stamps are UTC.  For durations, use the slash character
as described in 8601, and for multiple non-contiguous dates,
use multiple values.
Examples: "2001", "2001-01", "2001-12-31", "2001-12-31T23:59:59",
"2001-01/2001-02".

All languages must be specified using IETF RFC 1766.
Examples include "en" (English), "en-US" (U.S. English), "fr" (French),
"de" (German), "ru" (Russian), "ja" (Japanese), and "zh" (Chinese).

The "Structure" entry should be modified, changing
"single vector for vendor name" to
"single vector for encoding library", since that's what it is.

I originally had separate numeric values for USAGE and REDISTRIBUTION,
but that got complicated (and there's always the danger of mismatches).
It's unfair that the definition implies English definitions, but if
natural language is allowed you have to pick something.

================== ODD IDEAS =================================

If you're "not sure" if you want to accept a particular field name
definition, perhaps create a list titled "under consideration" --
that way, more people can see the proposal, and those who wish to
capture that information will at least have a suggestion on how
to capture it.

Perhaps append "_IMAGE" for the image.
E.G., "ALBUM_IMAGE-URL" references the URL for the album cover, and
"ARTIST_IMAGE-URL" would show the artist's picture.
This would look great in viewers like Nautilus.
One trouble is that this functionality also supports "web bugs" -
a server serving the picture could also record out who's viewing it.
You could even support locally-stored values, such as
"ALBUM_IMAGE" with the image and "ALBUM_IMAGE_TYPE" storing
the MIME type; the problem with these is, do you really want to
store image data inside a comment field?

Fields could be later defined for signing; probably this would need
to list the fields or field range signed, as well as the signature.

====================== RANDOM NOTES =======================================

MP3 can include metadata, so where possible, it'd be good to be able to
capture metadata commonly available for MP3.  MP3 is pretty awful, BTW.
According to "http://woodworm.cs.uml.edu/~rprice/ep/rehm",
the de facto MP3 metadata standard is ID3v1, and ID3v1.1, spec'ed at
"http://www.id3.org/id3v1.html" (Nilsson, M. "ID3 made easy", July 2000).
This only specifies title, artist, album, year, genre, track, and a
comment field -- and the current spec already covers this (yay).
Their "genre" is only one byte long, and it's all an ugly kludge.

There's a richer ID3v2 for MP3 metadata; it's not widely implemented.
See http://www.id3.org/develop.html for more on ID3v2.
There's no reason to use its short 4-character frame names, but
using the longer names for the information here (upper case, and with
underscores replacing space) would be a good idea.
This spec is really awkward.  For example, it confuses "copyright"
with notions of "usage" and "redistribution". As we know, some copyrighted
material may nevertheless have unrestricted use or redistribution.

There are two methods to specify languages:
IETF RFC 1766 (using 2-letter codes and ISO 639) and ISO 639-2
(using 3-letter codes).
RFC 1766 is far more common, e.g., it's used by HTML and by GNU gettext.
It also appears to be more flexible (it handles dialects; I don't have the
spec but I don't see anything on the web about handling dialects
in ISO 639-2).
Thus, I recommend RFC 1766.
ID3v2 uses ISO 639-2, but few use ID3v2.
For more info on ISO 639-2, see http://www.id3.org/iso639-2.html.
For more info on RFC 1766, see http://www.ietf.org/rfc/rfc1766.txt.

The "LENGTH" value could be computed by examining the whole file,
but that would require downloading and examining the whole file -
quite a pain.  It's easier to have this information in the comment field.

I used "-URL" instead of "-URI", because the term URL is
more widely understood.

I haven't tried to define how to capture lyrics/text.
I know that one proposal (by Ralph Giles on 28 August 2000) is at:
http://xiph.org/archives/vorbis/200008/0082.html

There's also a "Lyrics3" format defined for MP3 - the major interesting factor
here is that every lyric line is preceeded by the time(s) it occurs,
in parentheses.  This doesn't handle live work very well, but it looks
okay in many circumstances for prerecorded songs.  Here's an example
from ID3v2.  Note the odd way it handles information
like "Chorus".  Perhaps this could be stored in a "LYRICS3" field:

[00:02]Let's talk about time[CR][LF]
[00:02]tickin' away every day[CR][LF]
[00:05]so wake on up before it's gone away[CR][LF]
[00:10]catch the 411 and stay up like the sun[CR][LF]
[00:20]remind yourself what's done and done[CR][LF]
[00:32]so let yesterday stay with the bygones[CR][LF]
[00:40]keep your body and soul and your mind on[CR][LF]
[00:55]the right track infact you gotta stay on[CR][LF]
[01:20]the real black[CR][LF]
[CR][LF]
Chorus:[CR][LF]
[01:25][05:45]Time is tickin' away[CR][LF]
[01:42][05:55]you've gotta - live your life -[CR][LF]
[02:11][06:24]day by day[CR][LF]
[02:26][06:35]happy or sad, good or bad[CR][LF]
[02:31][06:42]life is too short[CR][LF]
[02:58][07:13]you've gotta - keep your head -[CR][LF]
[03:01][07:19](Repeat)[CR][LF]

Perhaps a "LYRICS" (or "TEXT") tag could be defined for just simple text
(ASCII/UTF-8, unmarked), and LYRICS_HTML for lyrics represented in HTML.
And to really synchronize things, SMIL should be considered.
This information can be large, and possibly should be streamed, though
sometimes you want the "whole thing all at once" instead of being streamed.
You'll need to have the ability to stream text, at least for real-time
material (e.g., someone typing in a translation of a real-time interview
to aid the deaf).

Anyway, I hope this helps.  By having some standard values,
encoders can get this data in there in the first place.
Yes, this kind of data can be added later, but it's easiest to get
this information while you've got the music in hand.

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.



More information about the Vorbis mailing list