Immersive Audio Glossary
- Jonatan Janson
- May 7
- 5 min read
Updated: May 14
Here is a glossary of key terms related to immersive audio to make them easier to access and ultimately learn.
To search for a word, please use Command/Control + F
1D Sound - A single audio channel (mono) with no spatial differentiation. All sound occupies the same point in the soundfield, varying only over time. There is no left/right or front/back information.
2D Sound - Audio distributed across a horizontal plane, typically left and right channels. Provides width and some depth cues but no elevation. Stereo, LCR, and other horizontal‑only formats fall into this category.
3D Sound - Audio that includes height in addition to width and depth. Sounds can be positioned above, below, behind, and around the listener. Achieved through height speakers, object‑based rendering, or binaural HRTFs. Examples include: 5.1.2, 7.1.4, 9.1.6, and Ambisonics layouts)
Surround Sound - Any multi‑channel format that places speakers around the listener on a horizontal plane. Includes quad, 5.1, 7.1, and similar layouts. Surround is 2D, not 3D.
3D Audio - Audio that supports full‑sphere localisation: left/right, front/back, and above/below. Achieved through height channels, object‑based rendering, Ambisonics, or binaural HRTFs.
Spatial Audio - In modern usage, spatial audio refers specifically to 3D audio — systems capable of positioning sound anywhere in space. Includes object‑based formats, Ambisonics, and binaural rendering. Does not include traditional surround.
Immersive Audio - A broad experiential term describing audio that envelops the listener. Immersive mixes may be 2D (surround) or 3D (Atmos, Ambisonics). It refers to the feeling of being inside the soundfield, not the technical method.
Audio Representation - A conceptual framework describing how audio is structured, encoded, and interpreted by playback systems. The three major representations are channel‑based, object‑based, and scene‑based.
Channel-Based Audio (CBA) - An audio representation where each signal is permanently assigned to a specific speaker channel (e.g., L, R, C, Ls, Rs). Optimal playback requires the same speaker layout used during mixing. Up‑mixing and down‑mixing can approximate other layouts but are not native to the format.
Object-Based Audio (OBA) - An audio representation built from discrete “objects” — audio signals paired with metadata describing their position, movement, and level. Objects are speaker‑agnostic, allowing mixes to adapt to different playback systems. Used in Dolby Atmos, MPEG‑H, and game engines.
Scene-Based Audio (SBA) - An audio representation that encodes the entire soundfield as a mathematical scene rather than discrete channels or objects. Ambisonics is the primary example. Like OBA, SBA is speaker‑agnostic and can be rendered to many playback configurations.
Binaural - A two‑channel headphone‑focused delivery format that uses HRTFs to simulate 3D spatial cues. Can be captured with dummy‑head microphones or generated by renderers from CBA, OBA, or SBA content. Not an audio representation — it is a rendering/delivery method.
Transaural - A loudspeaker‑based technique that delivers binaural cues through speakers by cancelling crosstalk between ears. Requires precise speaker placement and processing.
Head‑Related Transfer Function (HRTF) - A mathematical model describing how sound interacts with the listener’s anatomy (head, torso, pinnae) before reaching the eardrum. Used to create directional cues in binaural and spatial audio rendering.
Metadata - Supplementary information describing how audio should be interpreted — such as object positions, automation, loudness targets, or scene parameters. Essential for OBA and SBA workflows.
Format - A defined framework specifying how audio is authored, encoded, and reproduced. Examples: Dolby Atmos, MPEG‑H, Auro‑3D, Ambisonics.
Container - The file wrapper that holds audio data and metadata, such as ADM BWF.
Codec - The algorithm that encodes and compresses audio data, such as PCM or AC‑4.
Tool - Any software component provided by a format ecosystem — renderers, panners, analyzers, authoring suites — used to create or interpret spatial audio.
VR (Virtual Reality) - A fully synthetic, computer‑generated environment where the listener is visually and aurally immersed. Spatial audio responds to head movement.
XR (Extended Reality) - An umbrella term covering VR, AR, and MR. Combines real and virtual elements, often requiring dynamic spatial audio.
3DoF (Three Degrees of Freedom) - The listener can rotate their head (yaw, pitch, roll) but cannot move through space. Audio updates based on orientation only.
6DoF (Six Degrees of Freedom) - The listener can rotate and translate (move forward/back, left/right, up/down). Audio updates based on both orientation and position.
Speaker Configuration - A numerical description of speaker layout. The first number = horizontal speakers, second = LFE, third = height speakers.
Mono (1.0) - a single speaker or channel.
Stereo (2.0) - left and right speakers.
Quad (4.0) - front‑left, front‑right, rear‑left, rear‑right. A 2D surround format.
VoG (Voice of God) - A single overhead speaker used in some immersive formats (e.g., Auro‑3D) to provide a direct vertical anchor.
5.1, 7.1, 9.1 - Surround configurations with no height channels. The “.1” refers to the LFE channel.
5.1.2, 5.1.4, 7.1.2, 7.1.4, 9.1.4, 9.1.6, 11.1.8 - Immersive configurations with height speakers. The third number indicates the number of height channels.
LFE (Low-Frequency Effects) - A dedicated channel for low‑frequency content (typically 120 Hz and below). Not a subwoofer channel — it is a discrete effects channel.
Speaker-Agnostic - A system that adapts to different speaker layouts. Common in OBA and SBA.
Listener-Centric - Audio rendering that prioritizes the listener’s position and orientation, often used in VR/AR.
Loudspeaker-Centric - Audio designed for a fixed speaker layout, typical of CBA.
Up-Mixing - Expanding a mix to a larger speaker configuration (e.g., stereo → 5.1). Uses algorithmic inference to distribute content.
Down-Mixing - Reducing a mix to a smaller configuration (e.g., Atmos → stereo). Requires careful preservation of balance and intent.
Authoring - The creative stage where audio elements are placed, automated, and shaped within a spatial environment.
Rendering - The process of translating authored content into a specific playback configuration (speakers or headphones).
Mastering - Final quality control: loudness, metadata validation, format compliance, and deliverable preparation.
Encoding - Converting the authored mix into a final deliverable format or codec.
Distribution - Delivering encoded content to platforms, labels, or streaming services.
Playback - Real‑time decoding and rendering on the end user’s device.
Head Tracking - Real‑time adjustment of spatial audio based on the listener’s head orientation. Essential for VR and increasingly used in consumer spatial audio.
Remixing - Re‑interpreting an existing stereo or surround mix into the Atmos domain. Often applied to catalogue material.
Mixing - Creating an Atmos mix from the ground up, with the composition and arrangement designed for immersive playback.
Expansive - A mixing approach that enlarges the stereo image into 3D space while preserving the original front‑focused intent.
Creative - A freer approach where the mixer uses the full 3D field without being constrained by the stereo version.
Immersive - Composing and mixing with immersive audio as the primary medium, not as an adaptation of stereo.
Expanded/Enhanced Stereo & Stereo + - A conservative spatial approach where core elements remain front‑anchored, with supporting elements extended to sides, rear, and heights. Limited automation.
All-Around & 360 Degrees - A more adventurous approach where elements may occupy any position in the sphere. Heavy use of automation and movement to place the listener inside the narrative.
If you feel like any terms are missing, you can contact us through the 'About' section. And as always, check out other guides for more interesting information.
