👓VR/AR Art and Immersive Experiences Unit 5 – Spatial Audio for Immersive Environments

Spatial audio creates immersive soundscapes that mimic real-world sound perception in virtual environments. It enables listeners to localize sound sources in 3D space, enhancing presence and realism in VR and AR experiences. This technology utilizes binaural rendering and head-related transfer functions to simulate how sound reaches our ears. The physics of sound in 3D space, psychoacoustics, and human perception play crucial roles in spatial audio. Understanding how sound waves propagate, interact with surfaces, and are perceived by our auditory system is essential for creating convincing virtual soundscapes. Various technologies and techniques, including binaural audio, Ambisonics, and object-based audio, are used to capture, process, and render spatial sound.

Study Guides for Unit 5

5.1

3D audio and binaural recording

12 min read

5.2

Ambisonics and spatial audio formats

6 min read

5.3

Sound design for VR and AR

9 min read

5.4

Audio middleware and plugins

13 min read

5.5

Psychoacoustics and perception of sound in virtual environments

10 min read

Key Concepts in Spatial Audio

Spatial audio creates an immersive soundscape that mimics real-world sound perception
Enables listeners to localize sound sources in three-dimensional space (azimuth, elevation, and distance)
Enhances presence and realism in virtual and augmented reality experiences
- Provides a sense of being physically present in the virtual environment
- Improves user engagement and emotional connection to the content
Utilizes binaural rendering techniques to simulate how sound reaches both ears
- Accounts for interaural time differences (ITDs) and interaural level differences (ILDs)
- Incorporates head-related transfer functions (HRTFs) to model sound interaction with the listener's head and ears
Supports dynamic sound localization based on the listener's head movements and position
Includes both direct sound and reflections from surfaces in the virtual environment
Enables realistic occlusion and obstruction effects when sound is blocked by virtual objects

Physics of Sound in 3D Space

Sound propagates as pressure waves through a medium (typically air)
Sound waves have frequency, amplitude, and phase properties that determine pitch, loudness, and timing
In 3D space, sound waves emanate from a source and travel in all directions
Sound intensity decreases with distance from the source following the inverse square law ( $intensity \propto \frac{1}{distance^2}$ )
Sound waves interact with surfaces in the environment, resulting in reflections, reverberation, and absorption
- Reflections occur when sound waves bounce off hard surfaces and create echoes
- Reverberation is the persistence of sound in a space due to multiple reflections
- Absorption occurs when sound energy is absorbed by soft materials and dissipates
Sound propagation is affected by environmental factors such as temperature, humidity, and air density
Doppler effect occurs when there is relative motion between the sound source and the listener
- Pitch increases as the source moves towards the listener and decreases as it moves away

Psychoacoustics and Human Perception

Psychoacoustics studies the relationship between physical sound stimuli and the subjective perception of sound
Human auditory system is sensitive to a wide range of frequencies (20 Hz to 20 kHz) and sound pressure levels
Localization of sound sources relies on binaural cues processed by the brain
- Interaural time differences (ITDs) result from the difference in arrival times of sound at each ear
- Interaural level differences (ILDs) occur due to the shadowing effect of the head
Spectral cues, caused by the filtering effects of the outer ear (pinna), aid in vertical localization
Head-related transfer functions (HRTFs) describe how sound is modified by the listener's head, torso, and ears
- HRTFs are unique to each individual and can be measured or synthesized
Auditory masking occurs when one sound makes another sound difficult or impossible to perceive
- Frequency masking happens when a louder sound masks a quieter sound of similar frequency
- Temporal masking occurs when a sound is masked by a preceding (forward masking) or following (backward masking) sound
Precedence effect (law of the first wavefront) helps localize sound in reverberant environments
- The first arriving sound dominates the perceived location, while later reflections are suppressed

Spatial Audio Technologies and Techniques

Binaural audio reproduces spatial sound over headphones by simulating the acoustic signals at each ear
- Utilizes HRTF-based filtering to create a realistic 3D soundscape
- Requires headphones for accurate playback and localization
Ambisonics is a full-sphere surround sound technique that captures and reproduces spatial sound fields
- Uses a spherical harmonic decomposition to represent sound in terms of directional components
- Higher-order Ambisonics (HOA) provides increased spatial resolution and immersion
Wave field synthesis (WFS) recreates a desired sound field using an array of loudspeakers
- Based on the Huygens-Fresnel principle of wave propagation
- Enables accurate localization and natural sound reproduction over a large listening area
Vector base amplitude panning (VBAP) is a method for positioning virtual sound sources using loudspeaker pairs or triplets
- Calculates gain factors for each loudspeaker to create a perceived source direction
Object-based audio represents sound as individual objects with metadata (position, size, directivity)
- Allows for dynamic rendering and personalization of the soundscape based on the listener's position and orientation
Head-tracked binaural audio adapts the sound rendering in real-time based on the listener's head movements
- Enhances localization accuracy and immersion by maintaining a stable soundscape relative to the listener's head

Recording and Capturing Spatial Audio

Binaural recording uses a dummy head with microphones placed in the ear canals to capture spatial audio
- Directly captures the acoustic signals that would reach a listener's ears
- Provides a realistic and immersive listening experience when played back over headphones
Ambisonic recording employs a special microphone array (Ambisonic microphone) to capture the full-sphere sound field
- Typically uses four or more capsules arranged in a tetrahedral or higher-order configuration
- Records the sound field in terms of spherical harmonic components (W, X, Y, Z)
Spatial microphone arrays, such as the Eigenmike or Soundfield microphone, capture spatial audio with high resolution
- Consist of multiple microphone capsules arranged in a specific geometry
- Enable the capture of higher-order Ambisonic signals or directional audio components
Binaural synthesis can be used to create spatial audio from mono or stereo recordings
- Involves convolving the audio signals with HRTF filters to simulate spatial cues
- Requires knowledge of the sound source positions and the listener's HRTF
Spatial audio can also be captured using virtual microphones within a simulated acoustic environment
- Allows for the creation of spatial audio content in fully virtual spaces
- Enables control over the acoustic properties and sound propagation in the virtual environment

Processing and Rendering Spatial Sound

HRTF-based rendering applies individualized or generic HRTF filters to audio signals to create binaural output
- Simulates the acoustic transformations that occur as sound reaches the listener's ears
- Can be implemented in the time domain (convolution) or frequency domain (multiplication)
Ambisonics decoding converts the Ambisonic signals into loudspeaker feeds for playback
- Utilizes a decoding matrix that maps the Ambisonic components to the loudspeaker positions
- Higher-order Ambisonics decoding provides improved spatial resolution and localization accuracy
Binaural rendering can be optimized using head-tracking data to update the HRTF filters in real-time
- Ensures that the spatial audio remains stable and correctly localized relative to the listener's head movements
Reverberation and acoustic simulation add realistic room acoustics to the spatial audio rendering
- Can be achieved using convolution with measured or simulated room impulse responses
- Geometric acoustic modeling techniques (ray tracing, image-source method) can simulate sound propagation in virtual spaces
Spatial audio encoding and compression techniques reduce the bandwidth and storage requirements for spatial audio content
- Ambisonics can be efficiently encoded using spherical harmonic domain compression
- Binaural audio can be compressed using perceptual coding techniques that exploit spatial and temporal masking

Integration with VR/AR Platforms

Spatial audio is a crucial component of immersive VR and AR experiences
VR platforms (Unity, Unreal Engine) provide built-in tools and plugins for spatial audio integration
- Support for binaural rendering, Ambisonics, and object-based audio
- Allow for real-time spatialization and head-tracking synchronization
AR platforms (ARKit, ARCore) enable spatial audio in augmented reality applications
- Utilize the device's microphone and motion sensors for real-time audio processing and head-tracking
- Can anchor virtual sound sources to real-world objects or locations
Web-based spatial audio is possible through the Web Audio API and WebXR specifications
- Enables browser-based VR and AR experiences with immersive spatial audio
- Provides JavaScript APIs for spatial sound rendering, Ambisonics, and binaural processing
Spatial audio can be synchronized with visual elements and haptic feedback for a multi-sensory experience
- Enhances the sense of presence and immersion in VR/AR environments
- Requires careful alignment and timing between audio, visual, and haptic cues

Creative Applications and Case Studies

Spatial audio enhances storytelling and narrative experiences in VR/AR
- Directs the user's attention and guides them through the story
- Creates a sense of space and atmosphere that complements the visual elements
Immersive audio can heighten emotional impact and engagement in virtual experiences
- Enables realistic and emotionally resonant soundscapes (natural environments, concerts, film scenes)
- Enhances the sense of scale and grandeur in virtual spaces (museums, architectural visualizations)
Spatial audio improves the realism and effectiveness of VR/AR training and simulation applications
- Provides realistic sound cues for situational awareness and decision-making (flight simulators, emergency response training)
- Enhances the transfer of skills from virtual to real-world scenarios
Spatial audio can create unique and interactive musical experiences in VR/AR
- Allows for immersive and spatially-aware musical performances and compositions
- Enables interactive sound installations and audio-visual art experiences
Case studies demonstrate the impact of spatial audio in various domains:
- "Notes on Blindness" VR experience uses binaural audio to convey the sensory world of a blind person
- "The Encounter" AR audio drama utilizes spatial audio to create an immersive and localized storytelling experience
- "Runnin'" VR music video by Beyoncé uses spatial audio to create an immersive and interactive visual album experience