[vorbis-dev] clarifications on comments spec

Tue Jul 1 17:00:55 PDT 2003

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

First, sorry that I'm slow on the responses -- my DSL provider doesn't love
me.  ;-)

On Monday 30 June 2003 13:17, Ralph Giles wrote:

> This is meant to identify the 'encoder' so it's a little ambiguous when
> you're rewriting the file. In as much as this is metadata describing
> the compressed content itself, I'd leave it alone if you're just
> editing the tags.

No problems here -- I just wanted to make sure that I wasn't supposed to set
this.

> Well, specification of the packet structure is *entirely* up to the
> codec, so I wouldn't use Ogg::Comment. However, we do by convention use
> this structure in our designs for speex, vorbis and theora. (I'm not
> sure about Ogg FLAC) so maybe Xiph::Comment?

Ok, I was forgetting that Ogg can contain non-Xiph formats.

> There is one hitch in that the various codecs have different preambles
> before the common decode. Vorbis starts with 0x03,'vorbis'; Theora
> begins 0x81,'theora'; Speex has no preamble at all and begins the
> packet directly with the vendor length. You'll have to handle this
> variation somehow.

This is actually no problem for my implementation since the "comment" in my
case doesn't refer to a Vorbis header, but actually with the data that
happens to be contained in the second Vorbis header.  I leave it up to my
file representation to handle extracting that and the stuffing it back into
the Vorbis and Ogg wrappers.

I haven't written implementations for these other formats yet, but anticipate
doing so and am trying to get the design close to right on the first go-round
(famous last words).

> > *) Presuming there's no scheme for "padding" it would be nice if some
> > convention could be adopted.  This makes "tagging" much faster since
> > in most
> > cases it won't require rewriting the entire file.  This could be as
> > simple as
> > a standard comment field with an obvious name -- but I think I've come
> > up
> > with a better solution; more on that later.
>
> It may help to think of ogg at a bitstream rather than a file format.
> That's really the point of view from which is was designed. This is
> actually true of mp3 as well, but the folks who designed the tagging
> system didn't appreciate that.

Well, ID3v1 was an accident; "design" is a laughable word in that context.  On
the other hand ID3v2 is stream oriented, but very much overly complicated --
it takes a few thousand lines of code do a minimal implementation of its 40
page spec.  However, with their concept of a "stream" it was also recognized
that very much of the time said stream would be a file and that such a file
would require being rewritten without some provision for "padding" (It's also
a prepended format.).

> Aye, you have to buffer a bit. Look at icecast2 if you're curious.
> That's why we specify a page flush after the last header packet; you
> can just watch the granulepos and know when you've got the headers
> without parsing pages.

Ok, I think I understand things a bit better so here's a test to see if I get
it -- I actually can find out the size of the Vorbis comment header using the
current scheme since it will be the first packet that starts on the second
Ogg page and as such if it's fully contained in the second Ogg page I can get
the length of the Vorbis comment header packet by summing the lacing values
from the first until the first lacing value less than 255.  (And since a
comment is often less than 255 bytes, often the size of the Vorbis comment
header will simply be equal to the first lacing value.)  (Whew...)  How'd I
do?  :-)

> > *) Rerendering the full Ogg page(s) seems to be a requirement of the
> > current scheme.  This isn't particularly difficult, but could be
> > simplified.
>
> Yes, but yours is really the only application where that makes any
> sense. The page mechanism or something like it is required to limit the
> overhead, so you have to be able to handle all of Ogg to deal with the
> actual data. As you say, it's not particularly difficult, and libogg if
> available if you want some help. :-)

And using libogg would feel like giving in at this point.  ;-)  I think things
are starting to fit together for me though.  I'm going to go back this
weekend and try to make my implementation more enlightened...

> s/page/packet/ here. It usually takes a while to grok the two levels.
>
> The spec doesn't say either way about extra data in the header packet,
> but presumedly a good decoder would handle that. The reason we won't
> put it on a page by itself is that limits the length to 64k. Is that
> enough for everyone? That will also be a significant fraction of a
> low-bitrate file if you always use the full length for padding.

Given that Vorbis comments are text-only, I would assume that a limit of 64k
really isn't a strange or unusual constraint -- I mean, printed, that's
several full pages of text.  Certainly the "spirit" of the comment spec is
that it's not meant for huge amounts of data (thinking of the comment that
it's supposed to be equivalent to what you might jot on a CD-R).

As for padding, no I wasn't assuming that the maximum page length would always
be used.  The way that most ID3v2 implementations work is that the first time
you write a tag, you add ~2 kb of "padding".  You continue to work in that
tag size + 2 kb until you actually write something more than that size, and
then when re-writing the file you add another 2 kb.

> We have considered ideas of this sort in the past, particularly when we
> wrote our own example tag editor. So you're not the first to have to
> deal with this stuff. Generally our conclusion has been that it's not
> worth adjusting the spec for the convenience of only that application.

Yes, I realize that changing a spec is an annoyance -- that's why I was trying
to think of a way that wouldn't break existing implementations, and really
could be backwards compatible.  But of course once Ogg takes over the world
you must realize that you'll inherit all 436 taggers on SourceForge. (Yes, I
made that number up; I'm probably underestimating.)  :-)   When you get into
things like batch tagging, moving around the contents of 20-2000 files at
once takes a *long* time as compared to just inserting a few hundred bytes at
the beginning...

I've been trying to hack out an example of what I talked about and can't find
a variation that ogg123 seems to like.  (And again, because my DSL is out I
can't just grab the sources -- will do so at work tomorrow when I send this
mail.)

The variations that I tried were:
- - adding 1024 bytes to the end of the first packet in the second Ogg page,
  after the comment and
- - adding a new packet to the Ogg header (via the lacing values) and just
  appending 1024 bytes to the end of the page

Neither seemed to work, but it's entirely possible that I botched something
with the hex editor.

Again, thanks for the patience folks -- I realize that these are mostly
"dummy" questions in this context.

Cheers,

- -Scott

- --
New Orleans food is as delicious as the less criminal forms of sin.
 --Mark Twain

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (GNU/Linux)

iD8DBQE/AiC3Qu0ByfY5QTkRApDmAJ9CuEhMM2u9pplaYwtfkGR9hyg3XQCfUFoF
YolxlgsJE/3Np3SNAQ6uVqg=
=6Ofj
-----END PGP SIGNATURE-----

<p>--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.