[vorbis-dev] Tag format comments from a newbie

Thu Jun 22 08:15:25 PDT 2000

Hello, all:

As a newbie with a high opinion of my opinion, I couldn't let the opportunity pass by to put in my $0.02(USD)...

First, a summary:
We have two places to put information about the song I'm listening to; the metadata stream and the comment header.

The metadata stream is intended to be variable-length and (potentially) containing a number of disparate types of details.  Not just song title and artist, but possibly artist bio, maybe a picture of his/her family, dog...whatever.  (This said somewhat facetiosly (sp?), but as an extension of theoretical purpose, as I understand it.)  The attitude of intent is that it would be directly related to the music.  Lyrics, play-along instructions and such.

The comment header, on the other hand, is intended for quick-and-dirty details.  Small, easy-to-read quick-and dirty details.

Along with this diversity, we have another complexity to add; the vorbis stream format is _NOT JUST FOR MUSIC FILES_  (am I wrong about that?)  The metadata stream could easily contain close-captioning information (for the deaf, or for foriegn langauges).  Certainly this is *not* the kind of information that would belong in the header details.

That done, here's my what-I'd-like-to-see:
I don't want an XML parser added to WinAmp.  It just doesn't belong there.  WinAmp should, however, be able to show me the title of the song I'm looking at.  Especially in my playlist (which tends to around 200 songs), I want to be able to review the order-of-appearance without sampling each song.  These two things together tell me that the comment header should be able to contain *some* of the song details.  How many of those details?  That's still up to debate.  I personally think that the minimum is artist and song title.  I wouldn't mind release track and copyright details, but I don't see much real *need* beyond that.

However!  The format is (near as I can see) intended to be FLEXIBLE.  What gets put in is based on the whim of the creator/editor.  It doesn't have to have anything - and if I want more, I can put it in.  Or less, I can take it out.  (It would be nice to have automation for those things...'course, that would hint at the need for CDDB signature kind of details be stored somewhere...)

I don't have a problem with these details being duplicated in a metadata XML stream.  Reall, if we think about it, the comment header is supposed to be small, so duplication of content should be pretty much a non-issue.

Comments on formatting:
I really like the idea of having multiple possible values for each tag, but the question arises - how can we fit that in when the tag position is implicit?  (A space-saving suggestion that I whole-heartedly approve of -- it saves space and makes application support coding simpler.)  Here's my take on the best way to go about the tag storage in the comment header:

Each type of tag has a pre-defined location in the comment header. (It might be a good idea to have a 'tag-list-descriptor' that would say how much of the tag list is used.  Maybe a 'highest tag index used' of course, that opens the door to include 'lowest tag index used', but I don't see the discussion reasonably going much beyond that.  So, let's assume the tag-list-descriptor holds two values; lowest-index-used and highest-index-used.)
If the tag is not utilized, it's size is 0.
It is possible to have 'empty' tags surrounded by 'used' tags.
When a tag _is_ used, the size-specifier defines the length of the tag contents - all of the entries for that tag.  So, we need some mechanism to separate entries.  For which, I have two options; a double-null-terminated list of null-terminated strings (which seems to be against the grain of the current development intent), or recycle the mechanism that tells us how long each tag is - but use it within the tag for the length of each value.  If we have a 'two bytes for length, followed by that many bytes' construct (for example), then we'd have two-bytes-for-field-length, two-bytes-for-first-value-length, ~n~ bytes of value, two-bytes-for-next-value-length, ~n~ bytes of value, two-bytes-for-next-value-length, ~n~ bytes of value...

All of this doesn't take custom headers into account.  I would figure that custom headers start at a particular index.  If only custom headers are used, the lowest-index-used would be the value for first-custom-header.

Comments on XML:
The company I work for is big on XML and content storage / searchability.  Looking a bit to the future, I can easily see our involvement in the video industry; being able to search through a collection of movies to find the one in which Elvis Presley says "hey, baby", for example.  (And, of course, the user can then pay a nickle to download and play the clip 
<smiley descripton="sardonic"/>)

With that background, I am _all_ for embedding a stream which has the text and position-in-media tags.  And since XML is intended for data interchange, it makes sense to me to be used for that purpose as the native format for the stream.

One last rambling thought before I close this up.  I would actually like to see multiple metadata streams as an option.  If I'm streaming a song to play while I work, I'm not interested in seeing the words, chords and drum-signatures as I write my code (or send e-mail to some list I just found).  If the streams are ?multiplexed?, it seems to me that a server could be written which could strip the unwanted metadata streams from the output - reducing my bandwidth hogging.

Speaking of which, I'm done.

Thank you very much, if you have even gotten to this point in the missive, for your patience.

Earl
TheGleep at bigfoot.com

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/