[vorbis-dev] comment field proposal

Monty xiphmont at xiph.org
Fri May 12 14:10:19 PDT 2000



Well, this is a bit stronger than a proposal; this is "what I plan to do unless
people see obvious flaws I missed"...

The text comment header is the second (of three) header packets that begin a Vorbis bitstream.  It is meant for short, text comments, not arbitrary metadata; arbitrary metadata will be put in a metadata stream, likely an XML stream type.  We've discussed this in length-- several times :-)

The comment header is a list of eight-bit-clean vectors; the number of vectors is bounded to 2^32 and the length of each vector is limited to 2^32 bytes.  The vector length is encoded; the vector is not null terminated.  In addition to the vector list, there is a single vector for vendor name (also 8 bit clean, length encoded in 32 bits).  Libvorbis currently sets the vendor string to "Xiphophorus libVorbis I 20000508"

(note: although the vector space in the ogg format is 8 bit lean, libvorbis currently assumes during encoding that the comments submitted for encapsulation are C style strings)

Libvorbis comments are 'unstructured', so it's time to impose a little
convention before things get out of hand.  Given that the comments are meant
for *simple*, *short* fields (think 'title', 'artist', etc), the structure
should be simple.  I say we pattern this after a simple UNIX style environment
array with common 'variable' names agreed upon ahead of time. 

That is, fields look like:

comment[0]="ARTIST=me";
comment[1]="TITLE=the sound of vorbis";

For the sake of completeness, I'm proposing:

A case-insensitive field name that may consist of ASCII 0x20 through 0x7D, 0x3D ('=') excluded.  ASCII 0x41 through 0x5A inclusive (A-Z) is to be considered equivalent to ASCII 0x61 through 0x7A inclusive (a-z).

The field name is immediately followed by ascii 0x3D ('='); this equals sign is
used to terminate the field name.

0x3D is followed by 8 bit clean field contents to the end of the field.

Implications: field names should not be 'internationalized'; this is a
concession to simplicity not an attempt to piss off the majority of the world
that doesn't speak English.  Field *contents*, however, should be
internationalizable... suggestions on the proper encoding for that?

We have the length of the entirety of the field and restrictions on the field
name so that the field name is bounded in a known way.  Thus we also have the
length of the field contents.

Individual 'vendors' may use non-standard field names within reason.  The
proper use of comment fields should be clear through context at this point.
Abuse will be discouraged.

Now all we need are a list of 'conventional' field anmes.  A stream is not required to use any/all of these field names, they're suggested for interoperability.  The suggestions below are also biased toward contemporary music album usage; analagous use for non music albums should be easy enough for people to figure out on their own...

TRACK
ALBUM
ARTIST
LABEL
CONTENT

(so there's the seed of a list.  Please submit obvious one's I've forgotten...)

Monty

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/



More information about the Vorbis-dev mailing list