[Flac-dev] Detecting lossy encodes

Fri Jan 7 18:00:56 PST 2011

On Jan 7, 2011, at 16:27, Declan Kelly wrote:
> It might be easy to fingerprint which MP3 encoder was used (and at  
> what
> settings) for uncompressed source audio, but I'd be really  
> impressed if
> anyone could analyse an MP3 or other lossy file that had been  
> transcoded
> more than once.

You may be right, but I actually suspect that it will be just as  
difficult to detect even just a single generation of MP3 encoding.   
Because lossy coding involves variable quantization in the frequency  
domain, that makes it rather difficult to predict a precise criterion  
for detection.

> Due to the frequency response of vinyl as a medium, bass has to be cut
> when recording, and boosted at playback. The RIAA standardised the  
> vinyl
> frequency response curves back in the 1950s or so - before that there
> were competing systems using variations on the same frequency curve.
> As stereo vinyl is cut at 45 degrees for mono compatibility (a form of
> mid side encoding) the difference signal translates into vertical  
> stylus
> movement. So the needle will jump out of the groove if there is too  
> much
> separation at lower frequencies.

If you're on OSX, then you can grab my free AudioUnit that implements  
RIAA decoding.  It's called AURIAA, and is available at http:// 
www.sounds.wa.com/audiounits.html

> As the human hearing can't really tell direction with lower  
> frequencies,
> it's not as essential. This same shortcut is why most movie "surround
> sound" systems have only one sub bass channel.

In this case, you have been misled by a common misconception in the  
consumer audio industry.

In actuality, the human hearing system is quite capable of telling  
direction with lower frequencies, it's merely incompatible with  
studio mixing.

At low frequencies, the human ear+brain system uses time delays to  
detect direction.  This is because low frequencies travel around  
obstructions like the head without significant volume losses.  Thus,  
it would be impossible to use volume to find the directional source.   
The brain then detects the leading edge of sounds, and compares the  
phase or time delay between left and right.  When the direction is  
straight ahead, the time delay is zero.  As sound sources move away  
from directly ahead, the time delay increases until a maximum  
determined by the speed of sound and the distance between your ears  
(plus a little extra for the path taken from one ear to the other is  
not through your head but around it, and that's a slightly longer  
time delay).

However, as the frequency gets higher and higher, it becomes too  
difficult for the brain to analyze the phase differences, because a  
high frequency waveform will repeat several times during the time  
delay, and it become impossible to compare phase when you don't know  
which cycle matches.  Fortunately, high frequencies are very  
directional - more directional than low frequencies - and thus the  
volume is attenuated when high frequency sounds bend around an  
obstruction (like the head or anything else).  So, the brain uses  
volume differences, not time/phase differences, to determine  
directionality for high frequencies.

What's true is that we humans are actually directionally deaf in the  
midrange, not at lower frequencies.  The time/phase technique is most  
effective at the lowest frequencies, and becomes less effective as  
the frequency gets higher.  The volume/amplitude technique is more  
effective at the highest frequencies, and becomes less effective as  
the frequency gets lower and less directional.  In the middle,  
neither technique is effective.  I seem to recall that this frequency  
is around 500 Hz, well below the 2 kHz to 5 kHz range where our ears  
are most sensitive.

The reason why most consumer electronics experts get this wrong is  
because of the standard techniques use in studio recording.  Most  
music is recorded as multiple channels, e.g., 16, that are each  
monophonic.  These channels are played back through a mixing console,  
and a simple pan pot is used to artificially place them in a  
location.  Because the pan pot only effects the volume, not the phase  
difference or time delay, this means that a studio recording is going  
to have no directionality at low frequencies.

But not all recordings are made in a sound proof studio.  A simple  
binaural recording will have plenty of time delay and phase  
information, just like the real world, and the human hearing system  
will easily be able to detect direction of low frequency sounds.   
That is, unless your audiophile salesman has convinced you that you  
only need one subwoofer, and thus your playback system is compromised.

Another factor that is showing up in digital production is the  
ability to create a 3D mixer, instead of a simple pan pot.  CoreAudio  
and other digital systems allow a monophonic sound to be placed at  
any position in a virtual 3D sound world, and the DSP will calculate  
the appropriate time delay and amplitude loss based upon the relative  
positions of the sound source and the virtual listener.  OSX and  
CoreAudio can even extend this virtual system to include knowledge of  
the actual placement of speakers attached to your computer, with the  
DSP automatically calculating the correct time delay and volume for  
each speaker in your system (whether 5.1, 7.1, 10.2, or more).  With  
such a system for creating sounds, you certainly are not limited by  
sound studio production limitations of previous decades, and, more  
importantly, there will be plenty of directional cues in the low  
frequencies.

By the way, the reason surround sound has only one sub bass channel  
is because it takes very little bandwidth to add one more channel.   
In actuality, all 5 channels have full sub bass included in their  
discrete channel.  The .1 channel is just a way to add more oomph  
without taking the full bandwidth of 6 channels.  Many surround  
mixers will place directional sub bass in the 5 channels, but this  
will only be heard on the best surround systems with more than one  
subwoofer, or at least with large speakers that can reproduce enough  
sub bass to be heard.  The .1 channel has nothing to do with human  
directional perception, and everything to do with taking advantage of  
something that is available at a low bandwidth cost.

Brian Willoughby
Sound Consulting