[vorbis-dev] AMBISONIC critique

Gregory Maxwell greg at linuxpower.cx
Tue Aug 15 03:42:25 PDT 2000



On Tue, 15 Aug 2000, Thomas Marshall Eubanks wrote:

[snip]
> The ear/head system mostly relies on phase differences for localization (i.e., your
> head is a phase interferometer, which is much more efficient than an
> intensity interferometer, of which more later).

It also uses intensity localization which is why conventional stereo can
create a convincing soundstage.

[snip]
> 20 to 200 Hz              - no localization
> 200 Hz - 8 to 10        - phase interferometry
> 8 kHz   - 20 kHz        - intensity interferometry

Intensity is also used .2K-10K.

[snip]
> Now let's look briefly at conventional practice. Your ears are located in a plane,
> more or less, so most sound  systems concern themselves
> with a 2-D representation. In general,
> most musical performances, from Vivaldi to Pearl Jam, plus plays, speeches, etc.
> comes from the front, so stereo speakers are not at right angles, but moved towards the front.
> In a real performance in a real place, there will be reflected sound from
> above and the sides, etc.

Just because most sound systems are limited, does n't mean that artists
and users want to be limited.

[snip]
> Surround sound in theaters is intended for two purposes :
> 1,) So that the speech is better localized at the screen, even if you don't sit in the
> center of the auditorium (I.e., the "sweet spot" is expanded.)
> 2.) To have the occasional sound come from "behind" (like the creak of a door in a
> thriller).
> 
> In a home stereo, #1 is not thought to be so important, but number 2 (for
> reflected sounds) is, Reflected sounds have lower SNR, so the "surround"
> part of the sound does take as many bits

The 'sweet spot' of standard stereo is very small and you lose 'center'
with a very small turn of the head which is why home systems are paying
increasing attention to #1 (they have a center channel now)

[snip] 
> at 200 Hz < f < 10 kHz, you need W, X and Y to obtain the same functionality
> which you get from right and left stereo. The location of the W speaker is
> problematic at these frequencies (where you place it DOES count). The phase
> matters a lot here, as here is where we use phase to localize.
>
> at f > 10 kHz, it is not clear how you are to implement "joint stereo" and save bits
> accounting for the ear's intensity interferometry. The location of the W speaker
> is still problematic, but not as badly.

W speaker?!? Okay. I understand your problem with ambisonics. *Full STOP*
 
You wouldn't have a 'w speaker' in a proper ambisonic system. Ambisonic's
main advantage is that is describes a speaker independant sound field!

I.e. the same W,X,Y,Z B format ambisonics can be decoded to a set of 4
speakers, 8 speakers, or a set of headphones and still perserve most of
the intended 'sound scape'. 

[snip]
> 3.) In NO case will you have the localization ability of the 5 channel or even 4 channel
> Dolby scheme ( for a similar total bit rate).

Dolby is useless for more then bouncing effects around a room. 

> 4.) The channels (W, X, Y and Z) are NOT loudspeakers located at a point,
> but a particular sound distribution over space. How to get these from real
> loudspeakers in general is very unclear to me. A particular type of
> microphone or loudspeaker  is being assumed (from the above web page) :

No sound format should assume the placement of *your* speakers, however
it's not economical to transimt every instrument in a seperate channel
with XYZ placing information (and that would fail to capture the
instrument's own spacial effects), ambisonics is a good comprimise.

[snip]
> This reliance on a particular transceiver is BAD. If you don't have these, what
> sort of sweet spot will you have (you can always make things work at a point).

You've got it all wrong. :)

> In summary, I  simply do not think that the Ambisonic scheme is particularly efficient nor does
> it
> scale.
> It seems way to mathematically rote to me, not tuned to the actual
> physics of the situation.

I think you need to actually read the ambisonics paper before commenting
on the system. Keep in mind that alot of the older ambisonics stuff was
pre-digital processing, so their speaker placement was limited by what
they could achieve by simple analog dematrixing. Powerful microprocessing
now allows us to use virtually any speaker placment scheme that provides
sufficent 'fill'.

To those working on ambisonics for vorbis:
It would be mose useful (IMHO) to first produce two apps.

1) And ambisonc mixer that takes wav files, and a text file describing
4d-placement and room characteristcs to produce a b-format (perhaps second
order too for expermentation) signal.
2) An app that takes the output from above and makes a two channel
HRTF-modeled output for headphone use.

We really don't have a way to achieve (cheaply) sample coorelated
multi-channel (>2), perhaps >4) output from the computer, so headphones
will be the easiest test case. 

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/



More information about the Vorbis-dev mailing list