[Flac-dev] Detecting lossy encodes
Brian Willoughby
brianw at sounds.wa.com
Fri Jan 7 18:00:56 PST 2011
On Jan 7, 2011, at 16:27, Declan Kelly wrote:
> It might be easy to fingerprint which MP3 encoder was used (and at
> what
> settings) for uncompressed source audio, but I'd be really
> impressed if
> anyone could analyse an MP3 or other lossy file that had been
> transcoded
> more than once.
You may be right, but I actually suspect that it will be just as
difficult to detect even just a single generation of MP3 encoding.
Because lossy coding involves variable quantization in the frequency
domain, that makes it rather difficult to predict a precise criterion
for detection.
> Due to the frequency response of vinyl as a medium, bass has to be cut
> when recording, and boosted at playback. The RIAA standardised the
> vinyl
> frequency response curves back in the 1950s or so - before that there
> were competing systems using variations on the same frequency curve.
> As stereo vinyl is cut at 45 degrees for mono compatibility (a form of
> mid side encoding) the difference signal translates into vertical
> stylus
> movement. So the needle will jump out of the groove if there is too
> much
> separation at lower frequencies.
If you're on OSX, then you can grab my free AudioUnit that implements
RIAA decoding. It's called AURIAA, and is available at http://
www.sounds.wa.com/audiounits.html
> As the human hearing can't really tell direction with lower
> frequencies,
> it's not as essential. This same shortcut is why most movie "surround
> sound" systems have only one sub bass channel.
In this case, you have been misled by a common misconception in the
consumer audio industry.
In actuality, the human hearing system is quite capable of telling
direction with lower frequencies, it's merely incompatible with
studio mixing.
At low frequencies, the human ear+brain system uses time delays to
detect direction. This is because low frequencies travel around
obstructions like the head without significant volume losses. Thus,
it would be impossible to use volume to find the directional source.
The brain then detects the leading edge of sounds, and compares the
phase or time delay between left and right. When the direction is
straight ahead, the time delay is zero. As sound sources move away
from directly ahead, the time delay increases until a maximum
determined by the speed of sound and the distance between your ears
(plus a little extra for the path taken from one ear to the other is
not through your head but around it, and that's a slightly longer
time delay).
However, as the frequency gets higher and higher, it becomes too
difficult for the brain to analyze the phase differences, because a
high frequency waveform will repeat several times during the time
delay, and it become impossible to compare phase when you don't know
which cycle matches. Fortunately, high frequencies are very
directional - more directional than low frequencies - and thus the
volume is attenuated when high frequency sounds bend around an
obstruction (like the head or anything else). So, the brain uses
volume differences, not time/phase differences, to determine
directionality for high frequencies.
What's true is that we humans are actually directionally deaf in the
midrange, not at lower frequencies. The time/phase technique is most
effective at the lowest frequencies, and becomes less effective as
the frequency gets higher. The volume/amplitude technique is more
effective at the highest frequencies, and becomes less effective as
the frequency gets lower and less directional. In the middle,
neither technique is effective. I seem to recall that this frequency
is around 500 Hz, well below the 2 kHz to 5 kHz range where our ears
are most sensitive.
The reason why most consumer electronics experts get this wrong is
because of the standard techniques use in studio recording. Most
music is recorded as multiple channels, e.g., 16, that are each
monophonic. These channels are played back through a mixing console,
and a simple pan pot is used to artificially place them in a
location. Because the pan pot only effects the volume, not the phase
difference or time delay, this means that a studio recording is going
to have no directionality at low frequencies.
But not all recordings are made in a sound proof studio. A simple
binaural recording will have plenty of time delay and phase
information, just like the real world, and the human hearing system
will easily be able to detect direction of low frequency sounds.
That is, unless your audiophile salesman has convinced you that you
only need one subwoofer, and thus your playback system is compromised.
Another factor that is showing up in digital production is the
ability to create a 3D mixer, instead of a simple pan pot. CoreAudio
and other digital systems allow a monophonic sound to be placed at
any position in a virtual 3D sound world, and the DSP will calculate
the appropriate time delay and amplitude loss based upon the relative
positions of the sound source and the virtual listener. OSX and
CoreAudio can even extend this virtual system to include knowledge of
the actual placement of speakers attached to your computer, with the
DSP automatically calculating the correct time delay and volume for
each speaker in your system (whether 5.1, 7.1, 10.2, or more). With
such a system for creating sounds, you certainly are not limited by
sound studio production limitations of previous decades, and, more
importantly, there will be plenty of directional cues in the low
frequencies.
By the way, the reason surround sound has only one sub bass channel
is because it takes very little bandwidth to add one more channel.
In actuality, all 5 channels have full sub bass included in their
discrete channel. The .1 channel is just a way to add more oomph
without taking the full bandwidth of 6 channels. Many surround
mixers will place directional sub bass in the 5 channels, but this
will only be heard on the best surround systems with more than one
subwoofer, or at least with large speakers that can reproduce enough
sub bass to be heard. The .1 channel has nothing to do with human
directional perception, and everything to do with taking advantage of
something that is available at a low bandwidth cost.
Brian Willoughby
Sound Consulting
More information about the Flac-dev
mailing list