The Infinite Surround
RESEARCH PAPER
Text Philip Maughan
IMAGES Area of Work
Issue 16
The Infinite Surround — Image by Area of Work

Spatial audio is transforming the ways in which we perceive and interact with sound. Modem teamed up with composer Pierre Rousseau and writer Philip Maughan to navigate the vast promises and implications of this new sonic realm. The Infinite Surround reimagines the process of listening, composing and recording as the world of spatial computing takes shape.

The first mechanical “speaking machine” was created by the Austro-Hungarian inventor Wolfgang von Kempelen in 1769. In its earliest iterations, it was capable of producing just two sounds – ma and pa – the same primal screams with which every human, regardless of their place of birth, makes an initial attempt to connect.

Language was the first technology that made producing and processing sound into a form of expression. It is a tool which enables humans to make or break bonds, both with one another and with the world around them. As we enter the era of spatial audio, it is connectivity itself that is being rewired and restructured. To quote Jonathan Sterne, writing in The Audible Past (2003), where “vision offers a perspective; hearing immerses its subject.”

In the 1940s, the Musique Concrète movement made field recordings and manipulated them to create new soundscapes. In the 50s and 60s, Karlheinz Stockhausen placed speakers around audiences to produce a “quadraphonic sound” experience. In the 1970s, Michael Gerzon introduced his Ambisonics format, refining “surround sound” techniques with full-sphere audio capture and playback.

What connects these 20th century innovators to our present is the way they took advantage of major technological shifts to push forward and irreversibly change perception. Along with Pink Floyd’s Azimuth Coordinator in the late 1960s, or the work of synthesist Suzanne Ciani, who still performs using quadraphonic systems today, we can trace a history of artistic investigation that parallels contemporary mass technological development, as artificial sensory organs are miniaturized and become ever-present in our lives.

AUGMENTED LISTENING
Hearing and listening were utterly transformed by the development of recording, playback and amplification. The impact these innovations continue to have on human consciousness and agency cannot be overstated. Up until the late nineteenth century, the experience of sound was contingent on the listener’s immediate context. Fast-forward a hundred and fifty years, and we possess the tools to choose and curate what, where, when and how we listen, but also to capture and modify sound in real-time.

Practical techniques such as noise-cancellation, “adaptiveness” and “transparency,” enable us to filter unwanted noise or enhance ambient sound to sharpen our awareness. Everyday audio tools no longer simply play back sound. They discern our surroundings and activities, adjusting their output accordingly. They may detect our presence on a crowded subway platform, for example, automatically softening the harshest noises while enhancing the frequency spectrum of the media we are listening to. And this is only the beginning.

As we incorporate the capabilities of artificial intelligence, we may find these tools do far more than refine our experience in the moment, unlocking new dimensions of personalisation and augmented listening. AI audio will likely find its way into everyday uses such as language translation. Consider strolling through a metropolitan city anywhere in the world as the voices around you, previously unfamiliar, are translated into your preferred language while maintaining the speaker’s original tone and emotion. What would be the impact of such real-time translation on second languages? Would it deepen our understanding of other cultures and communities if we could always speak and listen in our mother tongues?

Spatial audio transforms the environment into an immersive soundscape, with sounds appearing to come from all around and beyond the physical boundaries of the space.

SOUND AND TIME TRAVEL
The term furniture music (musique d’ameublement) was coined by the composer Erik Satie in 1917 to refer to musical arrangements which served as a backdrop rather than a central focus. Half a century later, Japanese musicians and architects created what they called “environmental music” for the communal spaces of the 1968 Olympic Games in Tokyo’s Komazawa Park. Their compositions were designed to blend into the surroundings, creating a holistic experience rather than a distraction, enhancing one’s interaction with the environment without overwhelming the listener.

The integration of AI-generated audio settings, meticulously tailored to the individual, the thematic context, or even specific historical periods, might extend to new, creative ways of listening. This sophisticated personalisation could allow users to navigate soundscapes that blur the boundaries between reality and augmentation. Imagine walking through midtown Manhattan enveloped in the serene hustle and bustle of a farmer’s market in New Amsterdam some four-hundred years prior. Why not go back even further and hear deer loping through the expansive Boreal Forest?

Back in the present, the alteration could be more immediate and sensory. For example, the disruptive honk of a nearby car could be replaced with the soothing tinkle of wind chimes. The incessant hum of traffic could take on the tranquil notes of a gently flowing country stream. This fusion of actual and digitally-crafted sounds would offer a level of immersion previously unimaginable, akin to having a personal composer or sound designer following you throughout your day, adapting audio signals to prioritize your desired interactions with the world.

PARALLEL FIELD RECORDINGS
While discussing our personal experiences using AI and spatial audio, the composer Pierre Rousseau recounted his recent experience during an artist residency in Florence, Italy:

“One of my favorite ways of passing time was to go for walks on the outskirts of the historic city, entering prompts into an audio diffusion model on my phone. As I walked, I would describe exactly what I was seeing – the distant view of Santa Maria Del Fiore cathedral, for example – asking the model to generate soundscapes that would match my descriptions. Only rarely was the software able to generate anything resembling reality. But it almost always produced something strange and fascinating: a precious interpretative layer or additional dimension that constituted a sort of ‘parallel field recording.’ When prompted by my description of the cathedral, it generated a fragmented soundscape, an assemblage of muffled church bells, crowd chatter and elements of traditional Tuscan music.”

Here we might offer a word of caution. While personalized soundscapes have immense potential to suit our preferences and transform our experiences, they risk enveloping us in a bubble of sonic isolation, a “cell” in which we are detached from the communal sounds that bind us to our surroundings and to each other. The same technology that promises to enrich our sensory world may also bar us from the spontaneous, unfiltered symphony already playing all around us. It is this unscripted soundtrack, after all, which offers integral clues that help us understand and navigate the world.

THE ARCHITECTURE OF SOUND
In 1958, Pierre Schaeffer founded the Groupe de Recherches Musicales. The GRM went on to host loudspeaker concerts (concerts de haut-parleurs) including the 80-speaker-strong Acousmonium in 1974, designed to project and spatialize sound in a concert setting.

Nicolas Godin of the band Air has pointed out to us that architecture and audio share much of the same lexicon, most obviously the concept of volume, but also texture, scale, resonance and harmony. Whereas in architecture, volume describes the ratio of open space within a structure, in sound it refers to the intensity of a signal. It may be that these two definitions are in the process of merging.

The near-term development of spatial audio will mean that soon anyone with a capable device will be able to record in three-dimensional space, playing back the scene without losing any of its depth and dimensionality. Not only can we hear the recording of a given room, we can navigate through it using controls such as head motion detection. This will redefine the role of the recording engineer, as recordings evolve from being a snapshot – or perspective on space – to becoming an interactive and explorable object.

As an example, “classical music” is likely to be recorded so that the dynamism of the orchestra as it explores the architecture of a concert hall is not lost. This is something which traditional recordings, however historic or moving, have never been quite able to fully convey. Moreover, listeners will be able to travel within the orchestra itself, providing new layers of emotional engagement and a renewed understanding of certain pieces and performances. First we are sitting by the violins during a performance of Maurice Ravel’s “Bolero”; moments later we cross through the players to the percussionists.

Spatial audio technologies – enabled by seamless, lossless 5G – could effectively eliminate geographical barriers during the recording process. With spatial computing, artists will soon be able to collaborate remotely as if they were together in the same studio. Provided they have access to the necessary bandwidth and computing power, they will be able to experience not only the input of their fellow musicians, but also the sound properties of the rooms in which they play, as if they were sharing the same physical space.

Again, without careful consideration, the ability to collaborate across continents could diminish the value and uniqueness of in-person interactions. At worst it may reduce our awareness of the organic acoustic properties of physical spaces. This new technology should be helping us conceptualize the way we shape the material world, not bypass it. Where historically a “record” was a document of a given moment to be stocked for future reference and use, perhaps we will need to reevaluate our understanding of this word.

SPONTANEOUS COMPETITION
The diffusion of computation to every aspect of our lives is redefining the ways in which music is composed. The exponential increase in processing power means that digital audio workstations are no longer bulky, stationary studio equipment, but user-friendly, portable interfaces which operate frictionlessly on lightweight computers, tablets and phones.

The portability of consumer software such as Ableton Live (along with its mobile counterparts, Push and Note) gives artists the freedom to produce music in diverse settings, liberating them from traditional studios. This enables a more spontaneous approach to creativity, which may ultimately be the core mission of creative technologies: to keep the amount of time between inspiration and expression to a minimum. This is one of the key challenges for the modern composer (at long last we have ditched the reductive “electronic musician,” recognising that all music recorded since the 1990s is at least in part “electronic”) to express themselves with as much immediacy as a traditional instrumentalist might have blown their trumpet or strummed their guitar.

We should welcome technologies which enable artists to bypass technique and conceptualize ideas, emotion and taste. We should also celebrate the lowering of economic barriers, unlocking artistic potential formerly constrained by the affordability and availability of necessary tools. Sonic Charge’s Synplant, with its groundbreaking Genopatch, enables users to craft synthetic copies, or projections, of any sampled sound – altering their “DNA” rather than manipulating pre-existing audio fragments. This represents a huge leap in terms of access to certain complex tones or timbres (akin to creative earthquakes such as synthesis, sampling, and pitch correction), allowing even the most inexperienced sound designers to include them in their compositions.

ALTERNATE ARRANGEMENTS
The evolution from mono to stereo to spatial and directional audio is opening up new possibilities for arranging and mixing music. In the near future, composers are likely to start composing volumetrically and space-appropriately once again, much as a “classical” composer would have arranged their work with a specific orchestral layout and space in mind.

Why should a kickdrum lay buried at the center of a stereo mix and not lodged at the far left end of a room, as a muffled vocal circles around the ceiling, whether in a simulated space or elsewhere? Perhaps one could plan how a piece will sound in the Église Saint-Eustache (a church in central Paris known for welcoming experimental performances) as it is being composed, rather than discovering how it navigates the space at soundcheck?

The use of technologies such as line array, beam steering and Dolby Atmos in developing immersive and multidimensional sonic environments is becoming more accessible to the everyday user. Apple’s integration of Dolby Atmos into its Logic digital audio workstation in 2022 is one example. This broad democratization is bound to lead to new artistic forms as more people are given the chance to experiment – much in the way that a broken speaker gave birth to distortion, or more specifically, “fuzz,” which in turn helped give birth to blues music. It was multichannel recording and stereo playback that made albums like The Beach Boys’ Pet Sounds possible, and made Pink Floyd’s The Dark Side Of The Moon an unprecedented immersive experience upon release.

INTERFACES AS INSTRUMENTS
As composing evolves, so too will the ways we interact with musical instruments. Influenced by the integration of spatial computing and AI technologies, instruments will no longer be seen solely as physical objects, but as interfaces that can interact with a myriad of digital and spatial elements. Augmented Reality (AR) and Virtual Reality (VR) will play a
role in redefining musical performance, making room for the creation of entirely new instruments which can only exist in digital space.

Some of these new instruments may encourage or even favor individuals who may have not been in a position to play due to disabilities. To cite one example, headsets create the opportunity for users to use head or eye movements to control software parameters. Given that the eye muscles are much faster and more precise than any of the larger, more powerful muscles used for limb movement, which, aside from the human voice, have traditionally been used to perform musical instruments, the impact may be profound.

Simply imagine the complexity of the music which could be expressed. In a recent conversation with Peter Theremin, whose grandfather Leon’s best known creation was a contactless electronic instrument operated by positioning the hands between two metal antennas, modulating the electromagnetic field between them, we learned that creating an instrument to be played with one’s eyes had been one of his grandfather’s lifelong dreams.

ETERNAL FEEDBACK
At the dawn of a new era in sound, we find ourselves inhabiting a curious loop. We are drawn inextricably to the Infinite Surround of augmented listening, recording, composing, and playing, while repeatedly returning to our foundations. Some days we fantasize about detailed, grandiose pieces of music in endless space. On others we simply want to listen to a modestly recorded acoustic guitar seeping through an old kitchen radio.

Alvin Toffler’s concept of “future shock” addressed the disorientation and stress induced by rapid technological change. In the ongoing dance of innovation and nostalgia, we witness an affection for past formats such as tape and vinyl, fragile gestures facing off against the constant flow of compressed data in the form of streaming. It’s an engrained human trait: the urge to seek out and proclaim the new is often silently accompanied by the comforting embrace of the familiar. Whether we want it or not, spatial audio and augmented listening and recording will redefine our interaction with sound and strain our creative boundaries.

Toffler understood the psychological impact of ceaseless change. The synthetic manipulation of sound – from the primal utterance of ma and pa to the complex structural concerns of ambisonics – leads us to a set of pivotal questions. Why and how do we choose to hear the world? Will we endeavor to strike a balance between agency and personalisation that retains a healthy reverence for the chaos and uncertainty of our environment? And how will that environment be remade by a planet-wide shift in vibrations that brings us closer and further at the same time?

Issue 16
Research & Vision Modem & Pierre Rousseau
Images Area of Work
Text Philip Maughan
Other Research
Through the Looking Glass
Issue 11
Partner oio
Through the Looking Glass
Ambient Interfaces
Issue 04
Partner MIT
Ambient Interfaces
OFFICE FOR DESIGN AND INNOVATION