Augmented Reality is transforming vision itself. Wearable hardware like AR-enhanced glasses will accelerate this transformation, enabling computers to model and interpret everything we encounter. After receiving early access to Snap’s AR Spectacles, Modem collaborated with creative studio oio to explore this new way of seeing.
Vision may be the most important way for humans to relate to our environment. It is how we form mental models of the world we inhabit, how we orient ourselves in physical space, and how we absorb many kinds of information. While the biological foundation of vision has not changed much over time — we basically have the same eyes we have always had — new technology is poised to utterly transform it, amplifying and extending our vision while drastically altering its purpose.
The camera, more than any other technology, continues to reshape vision. That process has been underway as long as cameras have existed, but it has accelerated in the digital era. For much of its history, the camera has taken the form of mounted or handheld devices that allow humans to capture, preserve, and distribute images to a wider audience. As cameras proliferated in the twentieth century, ultimately being assimilated into the smartphone, the world became increasingly saturated with the still and moving images those cameras produced. Eventually, photography was as much a feature of our environment as the reality it reflected.
The camera’s recent trajectory suggests that it is an intermediate stage in the evolution of vision, transitioning from a biological process to a mechanical one. The traditional boxy camera endures as an icon on digital interfaces, but the hardware itself has grown more subtle if not fully invisible. Today’s world is full of such cameras, constantly and automatically recording troves of visual data that no human will ever see.
As Benedict Evans says, this is a transition “from computers with cameras attached that can take pictures to computers with eyes that can see.” The images captured by this computer vision are not intended for human viewing, but rather as pure data, with algorithms as the intended audience. As Evans explains, general-purpose components are often better for solving problems than special-purpose components. Computer vision is highly flexible, and able to solve a wide variety of problems — just as human vision has always done. As the tools for collecting and processing visual data improve, Evans writes, many problems that don’t seem like vision problems will become vision problems.
A NEW KIND OF VISION
Computers still exist to serve human ends, and like photography a century ago, computer vision is ultimately a way to help us see better: humans and computers viewing the world together and sharing responsibility for interpreting and interacting with it.
Augmented reality (AR) has emerged as one of computer vision’s most compelling applications, enabling a user’s own visual field to frame the computer’s perspective and determine what the machine sees. With AR, vision itself assumes a new meaning, routing sensory input to both the brain and a computer simultaneously, as data for digital processing — artificial intelligence working in parallel with real intelligence. Meanwhile, the physical environment becomes a user interface, sheathed in a new layer of digital information. Seeing is no longer just a way to gather information about the world, but also a way to act upon that world.
As vision evolves, so do the tools that support it. Today, AR is most commonly experienced through a smartphone camera, but emerging forms of wearable hardware, particularly glasses, promise to more closely align what users and computers see. AR eyewear will likely become the material symbol of collaborative human-machine vision — a statement of the body’s ever-changing relationship to technology, and a tangible sign of transhumanism’s growing cultural significance.
Indeed, the stylistic futurism of recent AR glasses suggests that mainstream culture is finally ready to embrace AR itself. When Google Glass first appeared in 2013, it failed to gain broad traction and was even ridiculed; nearly a decade later, wearable technology is less the esoteric attire of early adopters in Silicon Valley than an expression of the digital infusion of everyday life. If our ordinary experience of reality is already heavily mediated by digital technology, thanks to ubiquitous smartphones, we might as well streamline that experience. With AR, the camera lens becomes our primary interface with our environment. Instead of tapping, we will simply look, and looking will be a form of action.
AR represents a sensory paradigm shift, but it is also intuitive. In a 2011 talk, Kevin Slavin describes emission theory, the erroneous belief (frequently held by children) that vision involves beams shooting outward from the eyes to “paint the world.” This idea reverses the reality of vision, which is the product of light entering the eyes from one’s external environment. With AR, in a way, a version of emission theory finally becomes true. Or at least, as Slavin observes, the existence of such outward-projecting vision would pair well with AR, which paints the world in its own way.
INTO THE MIRRORWORLD
A new kind of vision requires a new kind of world to view. For AR to work — for vision to beam outward rather than inward — the physical environment must be recreated as machine-readable data that corresponds and responds to the human gaze.
Unlike virtual reality, which immerses users within a comprehensive artificial environment that is decoupled from physical reality, AR must faithfully reproduce our reality, creating a foundation for layering additional information on top of it. Building a digital mirrorworld involves a significant amount of data input from cameras and visual sensors, including AR glasses themselves — an effort on par with Google’s vehicles endlessly trawling the world’s streets to gather the 220 billion images that constitute Google Maps’ Street View feature. When a user finally puts on their AR glasses, the digital world must already be constructed, and ready for them to interact with it.
The logic of AR is similar to that of magical realism: a genuine representation of the world that can also incorporate fantastical elements, which in turn enhance the effect of reality itself. Stepping into the AR-mediated mirrorworld might resemble Alice’s passage through the looking glass in Lewis Carroll’s sequel to Alice in Wonderland, an alternate reality where everyday life is transformed into something more magical — where slightly different logic and sensory experiences prevail. But however magical that alternate reality is, it must also remain closely mapped to primary reality.
FROM MAGIC TO UTILITY
So far, the most prominent uses of AR have indeed been in the magical realism vein: games, entertainment, and other playful displays of the technology’s abilities, such as the 2016 smash hit Pokémon GO, or Magic Leap’s whales splashing through the floor of a school gymnasium.
These whimsical displays are excellent marketing for the technology, but if AR is to become an everyday tool — another operating system for life — it may have to serve more utilitarian use cases. In other words, AR may have to become more mundane, supporting more ordinary tasks.
For AR to become truly useful, it will require carefully refined interfaces that help users make sense of their new, computer-enhanced vision. Humans are accustomed to the old way of looking — the kind that merely gathers information, not the kind that manipulates the object being viewed. A delicate spectrum of sight-based interactivity will have to develop, from glancing to gazing, at least as nuanced and intuitive as a smartphone’s touch sensitivity. Vision will not just be about what we look at, but also how we look at it.
And like smartphones, AR promises to reshape our perceptual world, which will increasingly correspond to the digital mirrorworld that represents it. The physical objects that surround us — furniture, food, storefronts, and even people and pets — will become discrete digital information access points. In this way, everyday life will take on more of the qualities already found in video games.
At its best, AR will immerse us in a hybrid environment that combines the intuitive familiarity of embodied experience with the informational richness of the internet. Hardware undergirded by social networks, such as Snap’s AR spectacles, will synthesize our social graphs and preferences with real-time location data to provide contextually relevant recommendations wherever we go. Kevin Kelly predicts that “eventually we’ll be able to search physical space as we might search a text — ‘find me all the places where a park bench faces sunrise along a river.’” And we will also be able to act upon that space just by looking, using “if this then that” triggers like glancing at a lamp to turn it on, gazing at a breakfast bowl to play a playlist of morning tunes on Spotify, or staring at a clock to show upcoming calendar events.
Harvesting the full potential of AR will involve a delicate balancing act. Human perception, visual and otherwise, has developed sophisticated mechanisms to filter information and prevent us from becoming overwhelmed by our environments. Today, digital technology often exploits those psychological mechanisms, which have yet to adapt to the unprecedented flood of information.
AR’s design will be critical. With the right UX and UI, AR might seamlessly integrate our physical and digital lives, functioning as background support for everyday activities while reducing information overload. But a poorly designed version of AR could also do the very opposite, making that information overload worse by amplifying the onslaught of environmental noise.
In his talk, Slavin describes how singular focus on individual points of data can diminish one’s sense of reality rather than enhancing it. “Reality is…the whole world around us, not just that thing in front of us,” Slavin says. And there is already plenty in front of us to focus on. The best version of AR, he explains, will not add to that surplus: “[We need to] invent new ways to see, rather than new things to look at.” Regardless of what form that assumes, and whatever role computers play in it, vision is not finished evolving.
Subscribe to the Modem newsletter to receive early access to our latest research papers exploring new and emerging futures.