[Flac-dev] Ogg encapsulation
Arc Riley
arc at Xiph.org
Fri Jul 30 12:39:00 PDT 2004
On Fri, Jul 30, 2004 at 10:49:56AM -0700, Josh Coalson wrote:
> it's good you brought this up, I want to finalize the Ogg FLAC
> bitstream mapping and add it to the docs. currently the way it
> is done in flac 1.1.0 is not ideal and probably should change.
And it should change soon- the time for end users mixing and matching
arbitrary codecs with eachother (ie, Theora + FLAC) is here. FLAC
bitstream "upgrade" utilities can be written fairly easily, too.
> lacking granulepos? not sure what you mean, flac 1.1.0 writes
> a granulepos of 0 for metadata packets and the correct granulepos
> for audio packets.
My bad, then. I have the most recent version but was using test
material that must have been encoded with a previous version...
> in flac 1.1.0, what ends up in page 0 is not clearly defined,
> which is not good. I don't remember the ogg documentation
> giving any recommendations as to what should be in the first
> page and I didn't really investigate it.
This is a good point. I'm going to sit down and write up a draft doc
for "Ogg codec design", especially how to map "standalone codecs". Pass
it around and get feedback on it, especially from Monty and Ralph.
> the way *CVS* libOggFLAC currently writes is as follows:
>
> - the 4 byte 'fLaC' header is put in the first packet
> - each FLAC metadata block is put in its own packet
> - each FLAC audio frame is put in its own packet
>
> granulepos is 0 for the fLaC packet and metadata packets.
> (I recall somewhere someone suggesting using a granulepos
> of -1 for non-audio header packets?)
No, -1 has a special meaning for granulepos, it means "no complete
Packet is available in this Page". 0 is the correct granulepos for
header pages, you have this right.
> when the fLaC packet and metadata packets are written, the
> library currently calls ogg_stream_flush() on each packet
> so that each fLaC/metadata packet is in its own page.
Packet 0 must always be flushed - you want decoders/demuxers to be able
to quickly grab a small Page from each stream and pass that page around
to the various codecs asking "is this yours?". Libogg2 flushes Packet0
automatically, no choice in the matter.
But you also want some minimal information, such as it's version and
granulerate, so that (de)muxers can handle granulepos -> time mapping
without having to decode any further than Page 0.
You must also flush at the end of your header packets so that tools such
as Icecast can easily cache the header pages, nothing else, and send
them immediatly preceeding "current" live data in the stream. The "end
of header" flush is something you have to do manually.
> but because some metadata packets can exceed the Ogg nominal
> page size of 4K, and even the max Ogg page size (~64K?), some
> metadata packets may not fit in a single page.
This is not a problem, as long as the base data for the codec is in
Page0 the rest of the headers can span multiple pages if needed.
> what is your recommendation here? is it that the fLaC and
> first STREAMINFO (which holds sample rate, #samples,
> resolution, etc) packets be flushed together so that they
> are in the same page, page 0? the total size of both packets
> is 42 bytes so this seems to be no problem.
No, Page #0 should contain only one Packet. My recommendation is that
this packet contain, at minimum, a version, samplerate, #channels,
and samplesize (resolution? 4/8/16/24/32 bit). Block and frame size
constraints should also be used here. You may also want to combine your
"Registered application ID" packet with the first header packet (prehaps
a field right after the version)
I do not recommend putting in fields for total samples in stream, since
this goes against Ogg framing, and the MD5 signature (while useful) is a
little redundant given that Ogg provides CRC for each page, and this too
goes against Ogg framing conventions. Having these frames as being
optional is OK, especially since they exist in FLAC, but requiring
either makes live streaming and one-pass encoding impossible.
the "Vorbis Comment" section should be on a packet by itself. The
seekpoint stuff is redundant and should not be used in Ogg encoding,
this data can easily be regenerated from Page granulepos's for
transfering from OggFlac->FLAC.
I'm not sure if cuepoints are especially helpful - it doesn't hurt to
include it (especially if you want to be able to transfer Flac <->OggFLAC
without data loss). I think cuepoints, if nessesary, make more sense to
be put in a generic metadata codec, such as been suggested for the
"kitchen sink metadata codec", so work apps that supported cdda stuff
could use it with any codec vs having codec-specific support.
> but adding in the codec version to the first page.... in
> FLAC the codec version is in the vendor tag of the vorbis
> comments.
When a codec looks at a stream, it needs to know from page 0 if it can
support a given stream. If someone is using a FLAC 1.x decoder and the
stream is marked as 2.x, the codec needs to know to reject it.
> the vorbis comments are in their own metadata
> block/packet, but there is no requirement for the vorbis
> comments to follow the STREAMINFO immediately. there may
> be other metadata in between. now, I can enforce it in
> libOggFLAC that it follows STREAMINFO immediately, but then
> the question is, how can you guarantee that the whole vorbis
> comment packet will also fit in page 0, given that there is
> not much restriction on the size of comments? is it enough
> that the first part of the vorbis comment packet that contains
> the vendor string is in page 0?
Comments don't belong in Page 0. They are not useful to a decoder or
muxer trying to figure out how to use a codec or displaying metadata
about it. A good example for this is how Vorbis works:
Packet 0: Identification Header (always flushed to Page 0)
32 bits: vorbis_version
8 bits: audio_channels
32 audio_sample_rate
32 bitrate_maximum
32 bitrate_nominal
32 bitrate_minimum
4 blocksize_0
4 blocksize_1
1 framing_flag
Packet 1: Comment Header
Packet 2: Codec Setup Header
floors, residules, codebooks, etc used for decoding
Packet 2 is flushed to page, but Packets 1 and 2 may appear on the same
page or on many pages, and may be continued between pages.
The purpose of Packet 2 is so this information doesn't have to be
repeated on every data packet. FLAC appears to repeat this information
on every frame, and as such, better compression may be possible simply
by moving this data to a header packet. Or, atleast, offset the
additional overhead we get from the Ogg page headers.
> audio packets are written out with ogg_page_out() with no
> attempt to manipulate the page boundaries. but the first
> audio packet will always start a page because all the metadata
> is flushed out to pages before audio data is written. is this
> also OK?
This is perfect. End of headers needs to flush, everything else should
go normally. This behavior is identical to that of Vorbis.
> your help is appreciated. I wouldn't worry too much about
> backward compatibility with old-and-previously-unwieldy Ogg
> FLAC because 1) not many people (anyone?) are using it yet since
> it has had no seeking support until recently in CVS; 2) it is
> trivial to decode an old stream with an old decoder and
> re-encode it with a newer encoder that complies to an official
> Ogg FLAC bitstream mapping.
The latter is a very good point, something I keep forgetting - FLAC is
lossless so transcoding FLAC -> FLAC is a lossless operation.
However, this is a good example of why a version field in Page 0 is
needed :-) Older apps will choke on the new format and, without the
version field, they won't know why. Also, new apps will have to detect
a four-byte Page 0 as being the "old way" if they want to support it.
> I haven't been following Ogg2 (are there any docs for it?) so
> I don't know what that entails.
Sorry, docs haven't been written yet. I believe that only Monty and I
are familiar with it, but docs are going to be written "real soon now".
I was going to write a "dummy's guide to migrating to libogg2" but
figure it'll be easier to just do alot of that work myself. The API is
very similar to libogg1, but at the same time, "everything has changed".
All buffers are "owned" by the library now, which is responsible for
memory management, and while some of the functions retain their same
names their arguments are of different types.
They are, however, similar enough so that libtheora supports both with
only a few #ifdef LIBOGG2's here and there.
libogg2's advantage is speed and memory consumption. libogg1 repeatedly
copies memory between buffers and other really inefficient things like
that.. Monty wrote libogg2 originally as part of Tremor, since lower
memory usage was needed, and wrote it such that data goes from the
bitpacker to the sync buffer while never being copied or moved in
memory. OggFile, the "Ogg System Library", will use libogg2 and will
probobally be distributed with it, and all "next-generation" apps which
use Ogg are likely to use libogg2. In other words, migrating FLAC is a
pretty high priority as far as getting it ready to be used with other
Ogg codecs.
More information about the Flac-dev
mailing list