This week, Apple announced it would be opening Siri to third-party apps—meaning, you’ll be able to summon an Uber, send a Snapchat, or launch a Skype call by talking with her. The personal assistant will also be available on desktops for such commands as file and Web searches, playing music, and sending messages.
But Siri is no longer the only assistant. Microsoft has Cortana, Amazon has Alexa, and Google has Now and Google Assistant, which will power Home, its forthcoming Amazon Echo rival. Each aims to edge out the others with its AI smarts. And each is a reminder that the interfaces that stand between us and what we want are melting away. As we move ever closer to a post-screen era, we now talk to all our gadgets, and they talk back to us like (hopefully) less-threatening versions of HAL.
Until now, conversation about these innovations has primarily focused on functionality: How quickly and accurately these AI-driven voice-activated systems respond to user commands. Accuracy is important, because as the technology develops, it gains more and more users. In her 2016 Internet Trends report, KPCB’s Mary Meeker points out that the number of smartphone owners using voice-activated assistants more than doubled in a couple of years, from 30% in 2013 to 65% in 2015.
But even as voice technology becomes ubiquitous, little has been said about the role voices—and the personalities they convey—play in delivering a branded experience. Thought leaders in design and tech aren’t quite sure just how close we are to the end of screens. But they have some ideas of what companies should consider when branding their voice experiences in the future.
Voice as persona.
Today’s most common “voices” are easy to personify. Imagine Siri. She’s middle-aged and wears out-of-fashion eyeglasses that complement her “timeless” outfits. She goes to museums a lot, alone. Sometimes she lets her straight-laced guard down to reveal a cheeky side, but it’s not always disarming. Ask Siri if she’s married, and she replies, ‘I don’t have a marital status if that’s what you’re asking.’ “She has wit,” says Huge VP of User Experience Sophie Kleber, “but it’s nerd humor.”
Now think about Cortana, Microsoft’s answer to Siri. She’s a character lifted from the Xbox Halo franchise, voiced by Jen Taylor, the actress responsible for the sound of the AI character in the video-game series. (In Halo, Cortana takes a voluptuous female form covered in blue circuitry.) She’s designed to capture the interest of gamers. Going to the museum alone isn’t her idea of a good time.
So far, the big players in the voice space have aspired to build a singular personality with mass appeal—a very tall order, since even the warmest, kindest, most charming human(oid) can’t win over everybody.
“These systems should first be there to listen to the end user, listen to what they want, like any good brand should.”
Eventually, however, companies will develop a wider breadth of voice personalities tailored to different groups of users, who, as they grow accustomed to hearing a myriad of disembodied voices, may develop a finer attunement to sound profiles, much as gourmands can appreciate the many flavor notes in a chocolate, wine, or coffee. “You could have different voices conveying different things—one voice to deliver information and another to bring emotion of different kinds,” says Bruce Nussbaum, the author of Creative Intelligence. “Like the music you play for different road trips, you should have different voices talking to you for different journeys.” To deliver a range of profiles, companies will begin investing more in sound-mastering their voices. Devices will learn to read users’ emotions: MIT Technology Review reports that researchers at Amazon are already working to give Alexa the ability to sense emotion from a person’s voice. “You will see designers working with sound engineers to define the right sound, as well as with people studying how sound affects human beings,” says Chris Koller, Huge’s VP of brand experience.
Some companies are beginning to do just that, including San Francisco–based Botanic.io, which specializes in building conversational interfaces. Its president, Mark Stephen Meadows, is on a mission to create characters that can gauge a user’s mood and sentiment, modulate responses accordingly, and build a trusted relationship with the user—a deep connection with the customer that every brand aspires for. Meadows stresses that a voice system needn’t have superior AI to be a hit, especially with a niche audience. “People get distracted by the artificial-intelligence component,” he says. “What you really need is to be able to approach the design knowing exactly who the user is and what they want, rather than try to do what Apple has done--making something that can talk about anything to anybody at any time.”
Developing a tool to speak with 14-year-old girls learning physics, for instance, requires far less “intelligence” than a Siri. It also requires a different pitch, vocabulary, and style of speaking—all of which must be factored into a voice interface to define its persona. “We have archetypes people follow, like, ‘Here’s the maître d, here’s the busboy, here’s the chef,’” Meadows says. “They dress differently, they act differently, and they play a role that allows me to know how to interact with them.”
“You’re not going to say, ‘Siri, remind me to buy tampons.”
A well-considered branded voice experience must also factor in the user’s context. Thus, the voice interface must exercise discretion, too. Key to the success of Amazon’s $180 Echo is its native environment—the home, a place where users feel comfortable barking commands about almost anything. In theory, most virtual assistants can be used anywhere, but as users soon discovered, social decorum dictated otherwise. “You’re not going to say, ‘Siri, remind me to buy tampons,’” Kleber says. With Alexa (the name of Echo’s persona), however, users can chat about toiletries with relative abandon.
“These systems should first be there to listen to the end user, listen to what they want, like any good brand should,” Meadows says, imagining the nightmarish reality that could emerge if brands disregard user interests. Your car, for instance, greets you with a cacophony of voices pitching products, with the radio asking if you want to buy speakers, the windshield wipers reminding you to buy more fluid, and the car seat advertising heating modules. “Suddenly, it’s like one of those markets in Mumbai where you have hundreds of people coming at you without considering your core interests,” he adds.
And armed with a ton of personal data, a voice interface could easily overstep what Kleber calls the “creep line,” the point at which personalized digital data becomes an invasive experience for the user. Should a company abuse the data—and Echo’s always-listening capability gathers a lot of data—it runs the dangerous risk of eroding brand trust.
Owning the experience.
Currently, the dominant players in voice are Apple, Google, Microsoft, and Amazon. Smaller incumbents into the space will most likely piggyback off the big guys, using their tech as a vehicle for delivering a distinctive brand experience. An interesting case in point: Amazon recently partnered with Warner Bros. and DC Comics on a special promotion of the Batman vs. Superman movie that lets Echo users solve the mystery of how Bruce Wayne’s parents were killed through a “choose your own adventure” game. Alexa functions as users’ eyes and ears, guiding them through rooms, giving them options, and helping them gather clues.
“Think of the sound that an Apple keyboard makes—click—versus the sound a PC keyboard makes—clunk.”
Regardless of whether a company is using Echo’s API or its own proprietary technology, it should find a way to lay claim to its sound. For some companies, developing signature sounds isn’t unknown territory. Mercedes, for instance, has gone to great engineering lengths perfect how its car doors sound when they close. The doors on a $115,000 S-Class limousine, for example, shut with a signature “vault-like thunk” that signals Mercedes’ upscale bracket. “Think of the sound that an Apple keyboard makes—click—versus the sound a PC keyboard makes—clunk,” Koller says. “One sounds like a lot of thought went into it; the other sounds like no thought went into it.”
But protecting a sound is tricky, as there’s no easy legal route to trademarking one. In 2000, Harley-Davidson famously withdrew its application to register the “potato-potato-potato” of its V-Twin engines after nearly six years of litigation. At the time, only 23 of nearly 730,000 active trademarks had been issued to protect a sound. Protected sounds include the MGM lion’s roar, NBC’s trio of chimes, and Lucasfilm’s THX Deep Note. The government may become more receptive to such applications in the future, as voice becomes a more pervasive marketing tool and the subliminal impact of sounds can be proven.
Those four other senses.
Koller cautions against thinking of voice as an isolated sensory experience; the more a brand can incorporate the other four senses, he says, the more evocative the experience it creates. “Alexa, for instance, has a voice,” he says, “but if we’re going to personify her, why does she not have a smell? It doesn’t have to be a perfume that elicits a male-female sexual response. She could have a smell like hope, or pine needles and snow.” (A number of behavior studies have revealed how smells can conjure vivid memories and emotions.) Koller considers voice a gateway to designing experiences that incorporate all of the senses—a marketing approach championed by Danish branding expert Martin Lindstrom in his 2005 book, Brand Sense.
“The humdrum sameness of the glass screen has made people hungrier for immersive interactions.”
Although Lindstrom made his case a decade ago, Koller thinks that leaps in technology and the humdrum sameness of the glass screen has made people hungrier for immersive interactions—hence, the current obsession with virtual reality. Sound and visuals alone can’t deliver the perfect VR experience; touch, smell, and perhaps even taste, will have to follow. Because in a Her-like future, Scarlett Johansson may not only speak in your ear, but wake you up with a haptic tap and the smell of ocean wafting through your bedroom window. It’s just a matter of which tech company will own the entire sensory experience.