[ogg-dev] OggPCM2 : chunked vs interleaved data
Sampo Syreeni
decoy at iki.fi
Tue Nov 15 15:25:44 PST 2005
On 2005-11-16, Jean-Marc Valin wrote:
> Otherwise, what do you feel should be changed?
One obvious thing that seems to be lacking is the granulepos mapping. As
suggested in Ogg documentation, for audio a simple sampling frame number
ought to suffice, but I think the convention should still be spelled
out.
Secondly, I'd like to see the channel map fleshed out in more detail.
(Beware of the pet peeve...) IMO the mapping should cover at least the
channel assignments possible in WAVE files, the most common Ambisonic
ones, and perhaps some added channel interpretations like "surround"
which are commonly used but lacking in most file formats. (For example,
THX does not treat surround as a directional source, so the correct
semantics cannot be captured e.g. by WAVE files. Surprisingly neither
can the fact that some pair of channels is Dolby Surround encoded, as
opposed to some form of vanilla stereo.)
(As a further idea prompted by ambisonic compatibility encodings, I'd
also like to explore the possibility of multiple tagging. For example,
Dolby Surround, Circle Surround, Logic 7 and ambisonic BHJ are all
designed to be stereo compatible so that a legacy decoder can play them
as-is. But if they are tagged as something besides normal stereo, such a
decoder will probably just ignore them. So, there's a case to be made
for overlapping, preferential tags, one telling the decoder that the
data *can* be played as stereo, another one telling that it *should* be
interpreted as, say, BHJ, and so on. Object minded folks can think of
this as type inheritance of a kind. But of course this is more
food-for-thought than must-have-feature since nobody else is doing
anything of the sort at the moment.)
> Anyone wants to speak in support of chunked PCM?
Actually I'd like to add a general point against it. The chunked vs.
interleaved question is an instance of the more general problem of
efficiently linearizing a multidimensional structure. We want to do this
so that typical access patterns (and in particular locality of access)
translate gracefully and efficiently. Thus we group primarily by time
(interleaving) when locality is by time (accessing a sample with a given
sampling time most increases the odds that a sample with a close by
sampling time is soon accessed) and primarily by channel (chunking) when
locality is by channel (accessing a channel will make it probable that
the same channel is accessed again); we also try to preserve rough order
of access.
Ogg is primarily a streaming delivery application, so we usually access
Ogg data by ascending time. Ogg does not support nonlinear space
allocation or in-place modification, so editors which are probably the
most important application in need of independently accessible channels
will not be using it as an intermediate format in any case. We're also
talking about multichannel audio delivery where the different channels
are best thought of as part of a single multidimensional signal, not a
library-in-a-file type collection of independent signals, so it can be
argued that the individual channels do not really make sense in
isolation. In this case access won't merely be localised in time, but in
fact the natural access pattern for recorders, transmitters, players and
even some filters is a dense, temporally ascending scan over some
interleaved channel ordering.
If we think of Ogg as a line format, all this translates into lower
packetization latency and memory requirements (buffer per multichannel
stream vs. buffer per channel) for interleaved data; if we think of Ogg
as a file format it translates into fewer seeks and less framing
overhead while streaming from disk. In most cases a chunked layout has
no countervailing benefits. Even interfaces which go with separate
channels aren't such a good reason to offer a chunking option because
were probably designed with some other application (like interactive
gaming or offloading processing load onto a peripheral) in mind, or
might simply be badly engineered (just about anything from MS).
Furthermore, if we really encounter an application which would benefit
from grouping by channel (say, language variants of the same
soundtrack), that can already be accomplished via multiple logical
streams. In fact the multiplexing machinery is there for this precise
purpose: the packet structure is a deliberate tradeoff between the
temporal order always present in streaming files and the conflicting
interest in limiting latency, error propagation and buffer consumption,
brought on by parallelism, correlations and indivisibilities over
dimensions other than time. If the channels are so independent of each
other or so internally cohesive that chunking is justified, then they
ought to be independent enough for standalone use and for placement in
separate logical streams, or even separate files. Whatever
interdependencies they might have ought to be exposed to the consumer
via OggSkeleton or external metadata in any case. Thus whatever we want
to accomplish by chunking is probably better accomplished by the broader
Ogg framework, or by some mechanism besides Ogg altogether.
The only valid reason to chunk the data I can think of is bitrate
peeling: chunking means that entire chunks/packets can be skipped to
drop channels. But this clearly isn't the best way to go about peeling
because, as I said, audio channels tend to be tightly coupled. We don't
go from stereo to mono by cleaving off the right or left channel, but by
summing, and if we simply drop a surround channel, we'll also break any
multichannel panning law. Thus if we want to enable peeling, we have to
use things akin to mid/side coding (like the UHJ hierarchy) or joint
progressive coding over the entire set of channels (e.g. Vorbis's
progressive vector quantization), and only then reorder and chunk the
data. As a result this sort of stuff will always be encoding dependent
and it shouldn't be specified at a higher level of generalization where
the machinery could end up being used for the wrong sort of encoding
(e.g. vanilla 5.1) and would impose its overheads (e.g. latency)
indiscriminately.
Not surprisingly this is how it's already done in Ogg: at least Vorbis
specifies that peeling is to be carried out by a codec specific peeler
operating within packets. The considerations which yielded this decision
apply directly to an intermediate level abstraction like OggPCM (below
Ogg multiplexing but also above a specific PCM coding like 16-bit big
endian B-format), so I think incorporating a chunking option here would
really represent a case of reinventing the wheel, square.
(Newbie intro: I'm a 27-year old Finnish math/CS student and coder, with
a long term personal interest in both audio processing and external
memory algorithms, yet without an open source implementation background.
I joined the list after OggPCM was mentioned on sursound, so it's also
safe to assume I'm an ambisonic bigot.)
--
Sampo Syreeni, aka decoy - mailto:decoy at iki.fi, tel:+358-50-5756111
student/math+cs/helsinki university, http://www.iki.fi/~decoy/front
openpgp: 050985C2/025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2
More information about the ogg-dev
mailing list