[vorbis-dev] File Format Documentation

Michael Smith msmith at labyrinth.net.au
Mon Dec 11 05:08:28 PST 2000

At 04:27 AM 12/11/00 -0600, you wrote:
>I'm trying to fully determine the file format for Ogg Vorbis audio files
>as produced by oggenc and I'm having a hell of a time. :)
>I've sorted through all the online documentation I can find. I've
>downloaded the latest CVS source respositories, sifted through the
>documentation there, and tried sifting through the source as well.

Well, the detailed documentation for the audio packets isn't done yet
(actually, I don't think formal documentation of this stuff has been
started). For what you want to do, however, there should be enough
documentation available (along with a few small hints from us or from the

>I should state that I'm not a C guru (I haven't used C in a few years) but
>I'm more of a Java guy (been doing this for four years now).
>Anyway, I'm trying to create a Java package (library) for use with Ogg
>Vorbis files, whereby I can read general information (format, bit rates,
>total time, comments, etc.) and write out updated comment information. In
>the future, I would also like to provide the ability to fully encode and
>decode streams. The reason I'd like to do this in Java is because that's
>what I know best and it is fairly platform independent.

I'd suggest that you'd find it much easier, faster, and generally better to
write an interface (using JNI) to the native libvorbis. Although this
doesn't give you binary portability, that wouldn't matter for the purposes
you describe below. As for source portability - libvorbis is significantly
more portable than java (there are already people using libvorbis on
platforms for which no java runtime exists). Since you know C (if not
well), building the interface should be fairly straightforward (the
libvorbis API maps _very_ cleanly to an OO language like java). 

However, you might want (I have no idea why, but anyway...) to write this
entirely in java. So, I'll describe what you need to do (maybe this'll be
helpful to others, too):

You say you want to find out the format, bitrates, track lengths, and
comments. The bitrate and track length parts are closely linked, and
difficult. I'll describe that below. The basic format and the comment
header are straightforward.

Your second message said that you've got the ogg framing figured out, since
that's the part that actually IS fully documented. As that describes, a
physical ogg streams consists of one or more logical streams. Assuming a
single logical vorbis stream (for the moment - there are some things you
need to handle for chained streams for the length calculation), you have
the following.

Firstly, there are 3 header packets. The first of these is the main header.
This is short. It has information like the number of channels, sample rate,
the average bitrate (this is NOT the calculated bitrate - there's no
guarantee that it matches the actual file, it's just what the encoder was
attempting to get). There's also some more info relevent to the decoder
(but not externally useful), like the block sizes. See
_vorbis_unpack_info() in info.c for details on the contents of the packet
(it's very straightforward code, but if you have difficulty let me know,
I'll describe it in more detail).

The third packet is the codebooks - that's not of interest to you, only to
the decoder internals.

The second packet is the comment header. This is what you need to
understand to read and write the comments. See _vorbis_unpack_comment()
(info.c, again) for any details I omit here. Note that (this is also true
for the main header (1st packet) each integer value is stored little endian
(and byte aligned in the headers). It's also unsigned unless otherwise

The comment header starts with a 32 bit quantity giving the length of the
vendor string. This is immediately followed by the vendor string (currently
"Xiphophorus libVorbis I 20001031"). The current library deals with this
(and returns it to calling programs) as a 0-terminated (C-style) string,
but I don't think it's actually stored with the terminator. So you do need
to use the length. This is followed by a 32bit integer giving the number of
comments. Each of these comments is a 32 bit length, followed by the string
(again, these are dealt with by libvorbis as being 0-terminated, but they
aren't in the actual file). So for instance, you might have (integer
lengths specified in brackets so I can do this in ascii...)

(32)Xiphophorus libVorbis I 20001031(2)(13)First comment(6)Second

That's pretty simple. If you want to change anything here (like
vorbiscomment does), just rewrite this packet - nothing else at the vorbis
layer changes. Then, you'll need to fix up the ogg layer. This means that
(in the worst case. It's simplest to just deal with the worst case as
vorbiscomment does, rather than checking for the simpler (and faster) cases
and doing those differently) you need to rewrite every page from the one
where the comment header is to the end of the file (of course, the comment
header packet might span multiple pages). The simpler case is to just
rewrite the page where the comment header is (possibly making this page
bigger, if this doesn't take it over the ~64k limit), and write that out
(followed by writing out the rest of the raw stream - you then don't have
to change anything in that). Since you say you understand the ogg layer, I
won't go into detail unless you ask specifically (this is getting long
enough as it is ;) Be careful of chained streams when doing this
(vorbiscomment is probably broken there).

So, that deals with the header packets. Assuming you've got that handled,
we now want to find out track lengths, and average bitrates. The sample
rate was found from the first header packet, so the track length is the
total samples (per channel) divided by the sample rate. The average bitrate
can then be easily found from this and the file size. The hard part here is
finding the total samples - this can't be simply stored in the headers
because vorbis is a streaming format.

I haven't done anything which has needed to do this directly, so some of
the details might be wrong here, but it should be roughly right. Ok. You
need to seek (seeking is required. If you can't seek, you can't calculate
track lengths or average bitrates. Be prepared to deal with that) to the
end of the logical stream. 

(an aside...) Vorbisfile does this by seeking to the end of the file. If
the final page has the same stream serial number as the first, then you
have the simple case - not a chained stream. Again, you have to deal with
the hard case too. If not, it bisects the file until it finds appropriate
start points for each logical stream.

It then captures the final page of the logical stream (all of this is at
the ogg layer, which is sufficiently documented). The granulepos value from
this page (the ogg docs may call it something else - it was renamed a while
back. Possibly 'frame') gives you the number of samples in the file. This
is pretty straightforward, but is made hard by the possibility of
unseekable streams, or chained bitstreams. I imagine the lack of unsigned
integers in java would also make it a pain, since granulepos is 64 bit. 

I think that pretty much covers it, but I've skimmed over some large
chunks, so feel free to ask more detailed questions.


--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.

More information about the Vorbis-dev mailing list