Stereo Perception and Sound Localization


In real life, we perceive the position of a sound source by using a number of auditory clues. The level difference and the time difference between sound arriving at our left and right ears, as well as the masking effect of our head, have been already introduced in our low frequency sound localization test. These clues help us to localize a sound in the horizontal plane, and are of primary importance in stereophonic recordings.

The fourth cue is a little harder to address, and relates to the pinnae, the scientific term to designate our earflaps. The shape of our outer ear indeed imprints slightly different frequency responses on sounds, depending on their angle of arrival. This allows front-back and bottom-top distinctions to be made.

The problem underlying this last clue, and the reason why it is seldom used in stereophonic recordings, comes from the fact that these frequency changes are complex to model and very specific to each individual's pinnae. Yet, some audio tests and recordings make use of the pinnae filtering. Take a look at our LEDR Test or listen to our binaural recordings (soon to be published on this site).

This page illustrates these four localization cues through various stereophonic examples generated from the same original recording. To gain a better understanding of the principles exposed here, (good) headphones are preferred to loudspeakers.

The test file (introduced as a monaural recording)

Our test signal consists of a recording of someone knocking on wooden doors. This section starts with a monaural version of the original recording.

Although the file only contains one single channel, your amplifier will send the signal to both of your speakers, simultaneously. This works fine, because stereo features a phenomenon known as the phantom center: when both speakers are fed with the same signal, they will create the impression of a third speaker in the center.

Unfortunately, this phantom center will shift when moving closer to one of your main speakers. This is the exact reason why the film industry has always relied upon a third speaker channel in the center in order to keep the sound anchored.

Level Difference

Stereo by Levels
The easiest way to place a sound in the stereo field is to use the so-called panorama control, or panning. Panning spreads the sound inequality between the two speakers, by decreasing the level in one channel, and increasing it in the other. At the extreme setting, sound will be present in one single channel only, and output by a single speaker. In all other cases, the sound will result from a mix between the two speakers.

In real life, sound pressure does not decrease much over the distance of your ears. Even when a sound comes from your side, both of your ears will get signals with comparable amplitudes. In such a sense, panning, such as applied in most studio recordings, is not realistic. But it's simple to implement, and it works well!

Timing Difference

Stereo by Delay
Instead, our brain mostly uses inter-aural time differences to infer from which direction a sound comes. This section recreates a stereophonic sound by delaying a channel with respect to the other. This time, no panning is used, implying that the levels of both channels are exactly the same. The only effect we play with is a 10 millisecond delay inserted either to the right channel (which then shifts the perceived sound to the left), or the left channel (a shift to the right). Such a delay is much higher than the value derived from the theory, and produces a wider stereophonic effect. The distance between our ears (0.2 m) divided by the speed of sound (340 m/s) corresponds to a maximum inter-aural delay of 0.6 ms.

Yet the pan pot of the audio engineer, on his audio console, only controls panning, and doesn't introduce any delay...

Timing Difference and Low-Pass Filtering

Stereo by Delay
+LP Filter
Building upon the last example, we keep on reverse engineering our auditory process, by introducing low pass filtering in the delayed channel. This low pass filter mimics the shadow effect created by our head: when the sound arrives at the more distant of the two ears, it comes not only at a slight delay, but is masked by the head too, which introduces a cut in the higher frequencies.

The filter used in our example is a -6 dB/oct Low-Pass with a 800 Hz cutoff frequency.

The Real Thing

Original Binaural
Let's now preview the original signal, recorded by placing microphones directly in someone's ear in order to capture the exact sound reaching his pinnae. As you will hear, the stereophonic imaging feels similar to our previous example, only more realistic!

The downside of this immersive and truly surprising result, is that it only works through headphones. Indeed, as the recording already embeds the pinnae transform from the original listener, it now must bypass yours. In other words, the recording should be directly injected in your ears, hence requiring headphones.

This now explains why the audio engineer only needs a pan pot on his console: from all the effects described here, only panning offers the highest compatibility with all existing sound reproduction systems, from monaural speakers to complex multi-channel installations, and through headphones.
© 2007-2014

AudioCheck Random Image
… hearing up to 20 kHz. And you?
89 users online
4120 users today