[vorbis-dev] Sample-granularity file length and editing whitepaper

Monty xiphmont at xiph.org
Sun Oct 29 13:48:38 PST 2000



A few folks have asked for specifics, at the spec level, on how sample
granularity editing works in Vorbis.  So, here's a whitepaper as the
beginnings of a real document.  This doc is also in CVS on branch_beta3.

Note that vorbisfile is not actually tested with the beginning
sample-offset spec for editing.  I'll be testing/fixing any bugs in
vorbisfile on that front now, and keeping around test samples for
folks to use witht heir own code.

****************************************************

Topic:

Sample granularity editing of a Vorbis file; inferred arbitrary sample
length starting offsets / PCM stream lengths

Overview:

Vorbis, like mp3, is a frame-based* audio compression where audio is
broken up into discrete short time segments.  These segments are
'atomic' that is, one must recover the entire short time segment from
the frame/packet; there's no way to recover only a part of the PCM time
segment from part of the coded packet without expanding the entire
packet and then discarding a portion of the resulting PCM audio.

* In mp3, the data segment representing a given time period is called
  a 'frame'; the roughly equivalent Vorbis construct is a 'packet'.

Thus, when we edit a Vorbis stream, the finest physical editing
granularity is on these packet boundaries (the mp3 case is
actually somewhat more complex and mp3 editing is more complicated
than just snipping on a frame boundary because time data can be spread
backward or forward over frames.  In Vorbis, packets are all
stand-alone).  Thus, at the physical packet level, Vorbis is still
limited to streams that contain an integral number of packets.

However, Vorbis streams may still exactly represent and be edited to a
PCM stream of arbitrary length and starting offset without padding the
beginning or end of the decoded stream or requiring that the desired
edit points be packet aligned.  Vorbis makes use of Ogg stream
framing, and this framing provides time-stamping data, called a
'granule position'; our starting offset and finished stream length may
be inferred from correct usage of the granule position data.

Time stamping mechanism:

Vorbis packets are bundled into into Ogg pages (note that pages do not
necessarily contain integral numbers of packets, but that isn't
inportant in this discussion.  More about Ogg framing can be found in
ogg/doc/framing.html).  Each page that contains a packet boundary is
stamped with the absolute sample-granularity offset of the data, that
is, 'complete samples-to-date' up to the last completed packet of that
page. (The same mechanism is used for eg, video, where the number
represents complete 2-D frames, and so on).

(It's possible but rare for a packet to span more than two pages such
that page[s] in the middle have no packet boundary; these packets have
a granule position of '-1'.)

This granule position mechaism in Ogg is used by Vorbis to indicate when the
PCM data intended to be represented in a Vorbis segment begins a
number of samples into the data represented by the first packet[s]
and/or ends before the physical PCM data represented in the last
packet[s].

File length a non-integral number of frames:

A file to be encoded in Vorbis will probably not encode into an
integral number of packets; such a file is encoded with the last
packet containing 'extra'* samples. These samples are not padding; they
will be discarded in decode. 

*(For best results, the encoder should use extra samples that preserve
the character of the last frame.  Simply setting them to zero will
introduce a 'cliff' that's hard to encode, resulting in spread-frame
noise.  Libvorbis extrapolates the last frame past the end of data to
produce the extra samples.  Even simply duplicating the last value is
better than clamping the signal to zero).

The encoder indicates to the decoder that the file is actually shorter
than all of the samples ('original' + 'extra') by setting the granule
position in the last page to a short value, that is, the last
timestamp is the original length of the file discarding extra samples.
The decoder will see that the number of samples it has decoded in the
last page is too many; it is 'original' + 'extra', where the
granulepos says that through the last packet we only have 'original'
number of samples.  The decoder then ignores the 'extra' samples.
This behavior is to occur only when the end-of-stream bit is set in
the page (indicating last page of the logical stream).
 
Note that it not legal for the granule position of the last page to
indicate that there are more samples in the file than actually exist,
however, implementations should handle such an illegal file gracefully
in the interests of robust programming.

Beginning point not on integral packet boundary:

It is possible that we will the PCM data represented by a Vorbis
stream to begin at a position later than where the decoded PCM data
really begins after an integral packet boundary, a situation analagous
to the above description where the PCM data does not end at an
integral packet boundary.  The easiest example is taking a clip out of
a larger Vorbis stream, and choosing a beginning point of the clip
that is not on a packet boundary; we need to ignore a few samples to
get the desired beginning point.

The process of marking the desired beginning point is similar to
marking an arbitrary ending point; if the encoder wishes sample zero
to be some location past the actual beginning of data, it uses a short
value on the first audio page* with a granule position value greater
than zero**.  The decoder sees that on the first page that will return
data from the overlap/add queue, we have more samples than the granule
position accounts for, and discards the 'surplus' from the beginning
of the queue.

*  The first pages of a vorbis logical bitstream are headers.

** It is possible that the first audio page[s] contain only the first
   packet, from which no data returns immediately due to the
   overlap/add nature of Vorbis packets.  If the first page[s] contain
   only the first audio packet, the granule position will only be -1
   or 0.  The page on which the second packet completes will be the
   page with the 'short' granule position.

Note that short granule values (indicating less than the actually
returned about of data) are not legal in the Vorbis spec outside of
indicating beginning and ending sample positions.  However, decoders
should, at minimum, tolerate inadvertant short values elsewhere in the
stream (just as they should tolerate out-of-order/non-increasing
granulepos values, although this too is illegal).

Beginning point at arbitrary positive timestamp (no 'zero' sample):

It's also possible that the granule position of the first page of an
audio stream is a 'long value', that is, a value larger than the
amount of PCM audio decoded.  This implies only that we are starting
playback at some point into the logical stream, a potentially common
occurence in streaming applications where the decoder may be
connecting into a live stream.  The decoder should not treat the long
value specially.

A long value elsewhere in the stream would normally occur only when a
page is lost or out of sequence, as indicated by the page's sequence
number.  A long value under any other situation is not legal, however
a decoder should tolerate both possibilities.

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.



More information about the Vorbis-dev mailing list