[vorbis-dev] M/S encoding ?

Mon Aug 21 13:04:05 PDT 2000

> Date: Mon, 21 Aug 2000 13:52:48 +0200
> From: David Balazic <david.balazic at uni-mb.si>
> Content-type: text/plain; charset=iso-8859-2
> X-Accept-Language: en
> Sender: owner-vorbis-dev at xiph.org
> Precedence: bulk
> Reply-To: vorbis-dev at xiph.org
> X-UIDL: V&"#!cKF!!#5S!!;+7"!
> 
> Hi!
> 
> I'm sending this message to both lame and vorbis developers, since
> my concerns apply to both ( and in case of vorbis it probably applies
> to the more advanced ambisonic modes the X, Y and Z parts )
> 
> Recently I did some thinking about M/S encoding and wondered if the same
> psychoaccoustic model is applied to the S ( and M ) channel as to normal
> L and R channels. My concern is that different maskings might result
> because of the mixing of the channels.
> 
> Consider this example :
> L channel has a signal at 5 kHz with amplitude 10,
> R channel has a signal at 5.2 kHz with amplitude 5
> 
> In L/R mode there is no masking and both signals are present in the
> encoded material.
> 
> But in M/S mode the signals are :
> ( assuming that the formulas are M = (L+R)/sqrt(2) , S = (L-R)/sqrt(2),
> saw this in a message on the LAME list )
> M : 5 kHz with A = 7.07 + 5.2 kHz at A = 4.24
> S : 5 kHz with A = 7.07 + 5.2 kHz at A = -4.24
> 
> now if this channels are processed the same way as L/R then the louder
> signal (5kHz) might mask the quieter one ( 5.2 kHz ) in both M and S
> channels ( provided there is a bitrate pressure ), thereby eliminating
> the 5.2 kHz signal.
> 
> In short : M/S mode might consider parts of music masked, that would
> not be masked in regular stereo mode.
> 
> This seems wrong to me, alltough is is possible that the brain would do
> this masking itself.
> 
> So what is the comment of experts here ?
> 
> Forgive me if I posted non-sense :-)
> 
> David Balazic
> 

Hi David,

That makes perfect sense, and in such a situation, mid/side
stereo should be turned off.  Ideally, you need to look
at each 'band', and if the maskings are very different
between left and right channels, do not use mid/side stereo
to encode that band.  If the maskings are the same in the
L and R channels, then any masked noise introduced in
the side channel encoding will be spread to both L and R
channels during decoding, where it will be masked equally
well since both L and R channels have the same masking.  

(these ideas come from:
Johnston and Ferreira, Sum-Difference Stereo Transform Coding,
Proc. IEEE ICASSP (1992) p 569-571.)

MP3 does not allow mid/side stereo to be turned on and off on a band
by band basis (AAC does), so in LAME we look at some kind of average
between the L and R maskings differences.  If this average is too
large, mid/side stereo is disabled for that frame.  This works much
better than the algorithm suggested in the ISO MP3 spec. But you still
run into trouble:  what if 90% of the bands can handle mid/side
encoding, and 10% cant?  LAME has to make a decision in
these cases, and it is possible it can make the wrong decision.
This is why at 160kbs and above, the default is to use
regular stereo.

Mark

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/