top of page

Colorless Binaural Externalization Processing Demonstration

Abstract

Abstract

In both entertainment and professional applications, conventionally produced stereo or multi-channel audio content is frequently delivered over headphones or earbuds. Use cases involving object-based binaural audio rendering include recently developed immersive multi-channel audio distribution formats, along with the accelerating deployment of virtual or augmented reality applications and head-mounted displays. The appreciation of these listening experiences by end users may be compromised by an unnatural perception of the localization of frontal audio objects: commonly heard near or inside the listener’s head even when their specified position is distant. This artifact may persist despite the provision of perceptual cues that have been known to partially mitigate it, including artificial acoustic reflections or reverberation, head-tracking, individualized HRTF processing, or reinforcing visual information. In this paper, we review previously reported methods for binaural audio externalization processing, and generalize a recently proposed approach to address object-based audio rendering.

 

In the following demonstration, examples are presented to demonstrate the differences between sample audio rendered with traditional stereo panning, binaural processing, and the proposed externalization algorithm.

This research was presented at:

AES Fall 2023 on Oct. 27, 2023  [slides]

ASA Meeting #184 on May 9, 2023

ADC 2022 on Nov. 14-16, 2022

AES Fall 2022 on Oct. 26, 2022  [paper]

AES Fall 2021 on Oct. 28, 2022.

 
Demonstration Begin

Please use headphones for the following demonstration.

Each section will simulate a single virtual source rotating around the head of the listener using different processing methods.
NOTE: The intended trajectory of the source (shown below) remains consistent for all examples, though perceptually it may vary according to the processing method used. It begins at 90° to your right and travels clockwise 1.5 times around your head in the horizontal plane (initially toward the back).

Stereo

Traditional Stereo Panning

Perceptual Representation of Stereo Panning

 

Demonstration:

  • Close your eyes and press play. Keep your head still, facing straight ahead.

  • Does the source sound like it travels around your head or back and forth through your head as shown in the animation to the left?

Explanation:

  • Stereo audio is the most ubiquitous audio format found in the world today.

  • However, without additional processing, traditional stereo playback has limited spatial positioning capabilities. It only allows for a single degree of directional freedom (left/right), ignoring the other two degrees (front/back and up/down).

  • Because of this, if a sound source were to be rotated around the listener's head for stereo playback over headphones, it would simply sound like it was traveling side to side through the head.

00:00 / 00:25
Binaural

Binaural Processing

Demonstration:

  • Close your eyes and press play. Keep your head still, facing straight ahead.

  • Does the source sound like it travels around your head? Does the source feel like it remains in the horizontal plane or does it elevate above your head as it passes in front as shown in the animation to the right?

Explanation:

  • To overcome these shortcomings of traditional stereo panning, applying binaural processing to an audio file can help to account for the other degrees of directional freedom.

  • Binaural processing can be achieved by applying filters and delays that simulate the physical characteristics of a human head and ears.

  • Binaural Processing:

    • The cartilage around the ear canal (pinna) and the geometry of a human head combine to act as a filter that can help humans to distinguish directional information (i.e. front vs back, up vs down, left vs right).

    • This filter can be measured and approximated by a Head-Related Transfer Function (HRTF).

    • The spectrum of an HRTF changes based on where a sound source is located.

  • Binaural Processing improves spatialization over stereo panning but still has some shortcomings.

  • One specific issue is that as the source passes in front of the listener, the perceived trajectory often moves up and over the head rather than remaining in the horizontal plane as intended.​

00:00 / 00:25

Perceptual Representation of Binaural Rotation

 

Spectrum of an HRTF (Neumann KU 100) as a source rotates clockwise around the listener.

 

(Thanks to IRCAM Spat)

Externalization

Externalization Processing

 

Demonstration:

  • Close your eyes and press play. Keep your head still, facing straight ahead.

  • Does the source sound like it travels around your head? Does the source feel like it remains in the horizontal plane (as shown in the animation to the left) or does it elevate above your head as it passes in front similar to the previous section?

Explanation:

  • A common method to approximate spatialization and/or distance from the listener is introducing artificial reverberation.

  • Reverberation, however, also introduces timbral coloration of the sound, ultimately changing its characteristics (e.g. a singer in a bedroom vs a concert hall).

  • The proposed processing method of this demonstration seeks to exploit the perceptual triggers of externalization produced by reverberation, while minimizing spectral coloration artifacts.

  • This is achieved by processing the source signal through a 2-channel quasi-all-pass filter that has the effect of adding a brief diffuse reverberation-like decay tail.

  • A detailed explanation of this processing is available in this paper.

Perceptual Representation of Externalized Binaural Rotation

 
00:00 / 00:25
Comparisons

Appendix - Comparisons

Stationary Source

Left -45°

Traditional Stereo

(No Processing)

00:00 / 00:05

Binaural Processing

00:00 / 00:05

Externalized

00:00 / 00:05

Stationary Source

Left -90°

Traditional Stereo

(No Processing)

00:00 / 00:05

Binaural Processing

00:00 / 00:05

Externalized

00:00 / 00:05

Stationary Source

Left -135°

Traditional Stereo

(No Processing)

00:00 / 00:05

Binaural Processing

00:00 / 00:05

Externalized

00:00 / 00:05

Stationary Source

Left -30°

Traditional Stereo

(No Processing)

00:00 / 00:05

Binaural Processing

00:00 / 00:05

Externalized

00:00 / 00:05

Stationary Source

Front 0°

Traditional Stereo

(No Processing)

00:00 / 00:05

Binaural Processing

00:00 / 00:05

Externalized

00:00 / 00:05

Stationary Source

Right +30°

Traditional Stereo

(No Processing)

00:00 / 00:05

Binaural Processing

00:00 / 00:05

Externalized

00:00 / 00:05

Rotating Source

Clockwise

Traditional Stereo

(No Processing)

00:00 / 00:25

Binaural Processing

00:00 / 00:25

Externalized

00:00 / 00:25

Stationary Source

Left -150°

Traditional Stereo

(No Processing)

00:00 / 00:05

Binaural Processing

00:00 / 00:05

Externalized

00:00 / 00:05

Stationary Source

Behind  180°

Traditional Stereo

(No Processing)

00:00 / 00:05

Binaural Processing

00:00 / 00:05

Externalized

00:00 / 00:05

Stationary Source

Right +150°

Traditional Stereo

(No Processing)

00:00 / 00:05

Binaural Processing

00:00 / 00:05

Externalized

00:00 / 00:05

Stationary Source

Right +45°

Traditional Stereo

(No Processing)

00:00 / 00:05

Binaural Processing

00:00 / 00:05

Externalized

00:00 / 00:05

Stationary Source

Right +90°

Traditional Stereo

(No Processing)

00:00 / 00:05

Binaural Processing

00:00 / 00:05

Externalized

00:00 / 00:05

Stationary Source

Right +135°

Traditional Stereo

(No Processing)

00:00 / 00:05

Binaural Processing

00:00 / 00:05

Externalized

00:00 / 00:05

Thanks to Alexey Lukin, Kurt Werner, and Evan Allen for previous work presented at the AES Show Fall 2021.
Thanks to Alexey Lukin and Roth Michaels for early investigations towards extending the prototype implementation and its evaluation in Max.

Future Work

Future Work

  • Psychophysical investigations of the perception of concurrent free-field and diffuse-field components of a sound event.

  • Producing binaural recordings that leverage familiar 2-channel stereo production techniques (such as stereo flanger or chorus effects), or apply externalization processing to selected tracks or stems in a stereo mix (e.g. a vocal track).

  • Virtual meeting and augmented or virtual reality experiences, combining the proposed externalization processing scheme with head tracking and HRTF individualization.

©2024 Christopher Landschoot

©2024 Virtuel Works

bottom of page