[Flac-dev] Ogg encapsulation

Fri Jul 30 12:39:00 PDT 2004

On Fri, Jul 30, 2004 at 10:49:56AM -0700, Josh Coalson wrote:
> it's good you brought this up, I want to finalize the Ogg FLAC
> bitstream mapping and add it to the docs.  currently the way it
> is done in flac 1.1.0 is not ideal and probably should change.

And it should change soon- the time for end users mixing and matching 
arbitrary codecs with eachother (ie, Theora + FLAC) is here.  FLAC 
bitstream "upgrade" utilities can be written fairly easily, too.

> lacking granulepos?  not sure what you mean, flac 1.1.0 writes
> a granulepos of 0 for metadata packets and the correct granulepos
> for audio packets.

My bad, then.  I have the most recent version but was using test 
material that must have been encoded with a previous version...

> in flac 1.1.0, what ends up in page 0 is not clearly defined,
> which is not good.  I don't remember the ogg documentation
> giving any recommendations as to what should be in the first
> page and I didn't really investigate it.

This is a good point.  I'm going to sit down and write up a draft doc 
for "Ogg codec design", especially how to map "standalone codecs".  Pass 
it around and get feedback on it, especially from Monty and Ralph.

> the way *CVS* libOggFLAC currently writes is as follows:
> 
> - the 4 byte 'fLaC' header is put in the first packet
> - each FLAC metadata block is put in its own packet
> - each FLAC audio frame is put in its own packet
> 
> granulepos is 0 for the fLaC packet and metadata packets.
> (I recall somewhere someone suggesting using a granulepos
> of -1 for non-audio header packets?)

No, -1 has a special meaning for granulepos, it means "no complete 
Packet is available in this Page".  0 is the correct granulepos for 
header pages, you have this right.

> when the fLaC packet and metadata packets are written, the
> library currently calls ogg_stream_flush() on each packet
> so that each fLaC/metadata packet is in its own page.

Packet 0 must always be flushed - you want decoders/demuxers to be able 
to quickly grab a small Page from each stream and pass that page around 
to the various codecs asking "is this yours?".  Libogg2 flushes Packet0 
automatically, no choice in the matter.  

But you also want some minimal information, such as it's version and 
granulerate, so that (de)muxers can handle granulepos -> time mapping 
without having to decode any further than Page 0.

You must also flush at the end of your header packets so that tools such 
as Icecast can easily cache the header pages, nothing else, and send 
them immediatly preceeding "current" live data in the stream.  The "end 
of header" flush is something you have to do manually.

> but because some metadata packets can exceed the Ogg nominal
> page size of 4K, and even the max Ogg page size (~64K?), some
> metadata packets may not fit in a single page.

This is not a problem, as long as the base data for the codec is in 
Page0 the rest of the headers can span multiple pages if needed.

> what is your recommendation here?  is it that the fLaC and
> first STREAMINFO (which holds sample rate, #samples,
> resolution, etc) packets be flushed together so that they
> are in the same page, page 0?  the total size of both packets
> is 42 bytes so this seems to be no problem.

No, Page #0 should contain only one Packet.  My recommendation is that 
this packet contain, at minimum, a version, samplerate, #channels, 
and samplesize (resolution? 4/8/16/24/32 bit).  Block and frame size 
constraints should also be used here.  You may also want to combine your 
"Registered application ID" packet with the first header packet (prehaps 
a field right after the version) 

I do not recommend putting in fields for total samples in stream, since 
this goes against Ogg framing, and the MD5 signature (while useful) is a 
little redundant given that Ogg provides CRC for each page, and this too 
goes against Ogg framing conventions.  Having these frames as being 
optional is OK, especially since they exist in FLAC, but requiring 
either makes live streaming and one-pass encoding impossible.

the "Vorbis Comment" section should be on a packet by itself.  The 
seekpoint stuff is redundant and should not be used in Ogg encoding, 
this data can easily be regenerated from Page granulepos's for 
transfering from OggFlac->FLAC.

I'm not sure if cuepoints are especially helpful - it doesn't hurt to 
include it (especially if you want to be able to transfer Flac <->OggFLAC
without data loss).  I think cuepoints, if nessesary, make more sense to 
be put in a generic metadata codec, such as been suggested for the 
"kitchen sink metadata codec", so work apps that supported cdda stuff 
could use it with any codec vs having codec-specific support.

> but adding in the codec version to the first page.... in
> FLAC the codec version is in the vendor tag of the vorbis
> comments.  

When a codec looks at a stream, it needs to know from page 0 if it can 
support a given stream.  If someone is using a FLAC 1.x decoder and the 
stream is marked as 2.x, the codec needs to know to reject it.

> the vorbis comments are in their own metadata
> block/packet, but there is no requirement for the vorbis
> comments to follow the STREAMINFO immediately.  there may
> be other metadata in between.  now, I can enforce it in
> libOggFLAC that it follows STREAMINFO immediately, but then
> the question is, how can you guarantee that the whole vorbis
> comment packet will also fit in page 0, given that there is
> not much restriction on the size of comments?  is it enough
> that the first part of the vorbis comment packet that contains
> the vendor string is in page 0?

Comments don't belong in Page 0.  They are not useful to a decoder or 
muxer trying to figure out how to use a codec or displaying metadata 
about it.  A good example for this is how Vorbis works:

Packet 0: Identification Header (always flushed to Page 0)
 32 bits: vorbis_version
  8 bits: audio_channels
 32 audio_sample_rate
 32 bitrate_maximum
 32 bitrate_nominal
 32 bitrate_minimum
  4 blocksize_0
  4 blocksize_1
  1 framing_flag

Packet 1: Comment Header

Packet 2: Codec Setup Header
 floors, residules, codebooks, etc used for decoding

Packet 2 is flushed to page, but Packets 1 and 2 may appear on the same 
page or on many pages, and may be continued between pages.

The purpose of Packet 2 is so this information doesn't have to be 
repeated on every data packet.  FLAC appears to repeat this information 
on every frame, and as such, better compression may be possible simply 
by moving this data to a header packet.  Or, atleast, offset the 
additional overhead we get from the Ogg page headers.

> audio packets are written out with ogg_page_out() with no
> attempt to manipulate the page boundaries.  but the first
> audio packet will always start a page because all the metadata
> is flushed out to pages before audio data is written.  is this
> also OK?

This is perfect.  End of headers needs to flush, everything else should 
go normally.  This behavior is identical to that of Vorbis.

> your help is appreciated.  I wouldn't worry too much about
> backward compatibility with old-and-previously-unwieldy Ogg
> FLAC because 1) not many people (anyone?) are using it yet since
> it has had no seeking support until recently in CVS; 2) it is
> trivial to decode an old stream with an old decoder and
> re-encode it with a newer encoder that complies to an official
> Ogg FLAC bitstream mapping.

The latter is a very good point, something I keep forgetting - FLAC is 
lossless so transcoding FLAC -> FLAC is a lossless operation.

However, this is a good example of why a version field in Page 0 is 
needed :-)  Older apps will choke on the new format and, without the 
version field, they won't know why.  Also, new apps will have to detect 
a four-byte Page 0 as being the "old way" if they want to support it.

> I haven't been following Ogg2 (are there any docs for it?) so
> I don't know what that entails.

Sorry, docs haven't been written yet.  I believe that only Monty and I 
are familiar with it, but docs are going to be written "real soon now".  
I was going to write a "dummy's guide to migrating to libogg2" but 
figure it'll be easier to just do alot of that work myself.  The API is 
very similar to libogg1, but at the same time, "everything has changed".  
All buffers are "owned" by the library now, which is responsible for 
memory management, and while some of the functions retain their same 
names their arguments are of different types.

They are, however, similar enough so that libtheora supports both with 
only a few #ifdef LIBOGG2's here and there.

libogg2's advantage is speed and memory consumption. libogg1 repeatedly 
copies memory between buffers and other really inefficient things like 
that.. Monty wrote libogg2 originally as part of Tremor, since lower 
memory usage was needed, and wrote it such that data goes from the 
bitpacker to the sync buffer while never being copied or moved in 
memory.  OggFile, the "Ogg System Library", will use libogg2 and will 
probobally be distributed with it, and all "next-generation" apps which 
use Ogg are likely to use libogg2.  In other words, migrating FLAC is a 
pretty high priority as far as getting it ready to be used with other 
Ogg codecs.