Introduction to immersive audio: The basics
This article defines various forms of immersive audio. There are also a few tips on how to ensure your immersive recordings work.
What is immersive audio?
Immersive means to be totally surrounded by something – water, f. ex., if you jump into it. However, immersive sound is more than immersion. It is also envelopment. Envelopment is related to spatial information, the perception of spatialness.
Listener immersion is partly achieved by receiving sound (from any direction), but it is more than that. The channels may contain information about the same sound source but from different angles. If all channels play precisely the same sound (multichannel mono), then there is no envelopment.
Formerly, immersive audio was called surround sound or 3D-audio, even though the loudspeakers were only placed on the horizontal plane. While traditional surround sound contains 5.1 channels (five with full bandwidth and one band-limited, low-frequency channel), the new formats can be, e.g., 7.1.4 (seven channels in the base layer, one low-frequency effect channel and four height channels), 9.1.4 or for instance, Dolby Atmos, Auro 3D, or in Japan: 22.2 (10 channels in the base layer, nine channels in an upper layer, three channels in an lower layer and two low-frequency effects channels).
How is immersive audio made?
Several recording and mixing techniques are used. Film mixes for Atmos and formats like it are usually a mixture of everything possible: mono sources, stereo sources and multi-channel recordings with spatial information. By adding sound objects to this, it becomes possible to hear spatial incidents – f. ex. the placement of an incoming helicopter at some angle obliquely to the rear. Timecode and coordinates determine where and when the object appears in the rendering, rumbling over the audience's head.
Ambisonics / higher-order ambisonics
Some audio professionals use ambisonics, either so-called 1st-order or higher-order versions. 1st-order ambisonics is based on a so-called A-format microphone array (four tetrahedral oriented cardioid microphones), which is subsequently reformatted to B-format virtually, with three bi-directional mics and one omnidirectional. Higher-order ambisonics typically consists of a physical sphere with a diameter of 10-20 cm, on which 8, 16, 32 or 64 microphones are placed and evenly distributed. By mixing the signals, you can get sharper characteristics, which can more clearly reproduce the impression of direction. The technique provides high precision, requiring the listener to sit in the sweet spot. Virtually all other positions result in the sound from the nearest speaker without the great feeling of immersion or envelopment. This technique works well in virtual reality, where you are always in the center. It is also simple to reduce to fewer channels, as all signals are coincident, i.e. coincide in time.
Spatial recording
Other natural recording methods, for, e.g., music production, include formats with a large distance between the recording microphones. Norwegian sound engineer, Morten Lindberg (2L), who makes fantastic Grammy-winning immersive recordings, uses an array with seven omnidirectional microphones as the base layer and four omnidirectional microphones for the height layer. There is a minimum of one meter between the microphones, which generates a lot of uncorrelated sound, which makes a listener feel enveloped by the sound. The listeners’ position has less importance. It is a technique that is not only suitable for music but also for establishing soundscapes that can be listened to over a larger area.
De-correlation
In some mixes, the sound originates from a few channels. From these channels, for instance, base-layer channels, other channels like upper-layer channels are derived by de-correlation. This technique involves special processing of the audio. Actually, this technique also applies to large format setups (sound reinforcement/concerts) because it reduces the undesired effect of comb-filtering.
Loudspeaker setup vs. microphone setup
Often, it is the loudspeaker setup that determines the initial microphone configuration. So, a simple miking technique is to place one microphone for each loudspeaker/channel. The question is how to achieve the most prioritized parameters – precision, coverage, immersion, envelopment, spectral balance or any other parameter. It is extremely difficult to achieve all of these parameters at the same time.
The best way forward is to experiment and try out different configurations. As mentioned above, directional microphones positioned close together can provide a good sense of source position (low geometric distortion in the sweet spot). This is usually the preferred technique for the scientific reproduction of soundscapes and the like.
Spaced microphones provide more envelopment. Both directional microphones and omnidirectional microphones can be used. However, omnidirectional microphones are excellent for low-frequency pickup, which can be enjoyed if all loudspeakers have a decent low-frequency response. If some degree of directivity is desired, it is possible to use omnidirectional microphones with acoustic pressure equalizers.
Tips for immersive recording
- Adjust the gain of each microphone in an array to exhibit the same sensitivity during recording
- Larger spacing between microphones provides more envelopment but, in some cases, less directional precision
- If the loudspeakers are spaced far apart, the microphones need to be distanced as well
Photo credit: Morten Lindberg/2L.