The revolution will soon be vocalized. Five years after Apple’s personal assistant, Siri, appeared on iPhones, voice user interfaces (VUIs) have gone mainstream, fueled in part by the surprise popularity of Amazon’s Echo, a hands-free speaker powered by a personal assistant called Alexa.
Advances in technology have now made voice a viable consumer proposition. Speech-recognition error rates are nearly on par with human accuracy at 5.5%. Google recently announced that it had reduced its speech-recognition error rate by 30% since 2012. At the same time, major tech companies are looking at VUI devices as a platform, unlike the Internet, that they can control. In November 2016, the search giant followed Amazon's lead with its own smart speaker, Google Home. Microsoft recently released its own smart speaker powered by Cortana AI, and Apple is rumored to have a similar device in the works. Machine learning stands to only improve with continued investment by these tech behemoths, and it will increase its reach through user adoption. The virtual assistant market is expected to exceed $3 billion by 2020.
Maturing VUIs portend a major shift in how we interact with computers as we move away from screens to the most natural interface of all—speech. But the pace of that change depends on how well these VUIs deliver value for users.
Earlier this year, Huge launched its own user test lab—an initiative we affectionately call Curie—to validate user benefits, as well as identify pain points in emerging technology; VUIs were a natural choice for our first lab.
Just as we were finalizing our report, Amazon announced its latest Echo, the Show, featuring a seven-inch screen for showing users how Alexa responds to requests, rather than just telling them. Amazon’s introduction of a screen may be an admission that consumers aren’t quite ready for a fully ambient future, or that at least that they’re not comfortable buying items—how Amazon generates much of its revenue—without seeing them first.
That’s also what we found in our research, in which we polled 500 Echo, Echo Dot (the Echo’s smaller cousin), and Google Home users, and interviewed 17 others in New York City and suburban New Jersey about how they use their devices. According to the survey, the number-one reason why the respondents didn’t shop with their device: the lack of a screen (10.03%). But an extremely close second place was that they didn’t know how to shop (10.02%), and 8.33% didn’t even realize that they could. (The third-place reason, privacy concerns, is a topic that deserves its own exploration.) In other words, while a screen can remove some retail hurdles, it doesn’t get rid of all of them. To help users acquaint themselves with the idea of shopping through a voice user interface, or VUI, smart-speaker makers may need to improve onboarding and offer incentives to entice reluctant shoppers.
Even the tech-savviest of VUI owners tend to use their devices for a few relatively basic things. The most common use cases, according to our survey: playing music (42.7%), checking the weather (29.5%), and getting the news (24.5%). Likewise, the users we interviewed mastered a few third-party features—what Amazon calls Skills and Google calls Actions—and explored no further.
Leaving aside the Home, whose retail offerings are still limited, why don’t more Echo users take advantage of a built-in shopping platform? For starters, Amazon doesn’t do a great job (yet) telling them how. Part of the problem is the very nature of the device—a reactive canister that responds to users only when they utter correctly worded commands. Amazon sends a weekly email to Echo owners detailing the latest Skills, but the onus is still on the user to implement them and commit their respective spells to memory.
Based on interviews with actual users, both inside and outside of Huge, we came up with some ideas for improving the onboarding process and familiarizing people with the notion of shopping through a screenless device:
- Cast the Echo as a shopping device from the get-go. This one's simple: As a first step to setting up the Echo app, ask for the user’s payment information and tell her why. Even if she never orders anything through the device, she now knows that she can.
- Offer an optional onboarding session. The beauty of installing an Echo is that it’s intuitive. Plug it in, connect to a WiFi network, and you can start playing tunes—forever and ever. For those, however, who want more guidance, Alexa could greet them with an optional tutorial, running down the most popular Skills and a lesson on how to order Amazon Prime items.
- Break out of the passive routine. Sure, people don’t want to have a pushy salesperson disguised as a speaker in their homes. But they do want a personal assistant who can create shortcuts and anticipate their needs to save them time. Once an Echo user places an Amazon order online, her Echo could give her a one-time prompt that goes something like this: “Did you know that next time you can re-order through me? If you want to know how, say, ‘Alexa, tell me how.’”
- Tap into a user’s circle of trust. All of the users we spoke with said that they often consulted with friends and family when researching a purchase. Access to a trusted source’s insight, opinion, or testimonial was often the swaying factor in following through on an impulse to buy. Outside of their personal circles of influence, they turned to a trusted coterie of online experts and publishing sites. “I’m always trying to look for stuff where it’s like this person genuinely likes this product and is telling me why as if they were my friend,” Christine, a Home user we interviewed, said. Her go-to expert for recipes and cooking equipment is J. Kenji López-Alt of cooking website Serious Eats. If she could get product recommendations from him, and other verified sources, she says, she’d use her device more often for shopping. Partnering with reputable content creators, such as the New York Times–owned recommendation sites Wirecutter or Sweethome, could help Amazon penetrate users’ trusted group of retail advisors.
- Connect to pre-existing screens. Adding a display, as Amazon did with the Echo Show, is one way to satisfy our screen addiction. But shooting images and information to the screens we already own would also work, according to some of our interviewees. “There’s potential to greatly expand functionality once you have some visual reference points, [such as] an app where you get visuals beamed to a tablet in the kitchen or the TV set,” said Seth, a Google Home user we interviewed. He also imagined a scenario in which his Home could grab purchase information from the text messages on his phone: After admiring a friend’s sofa throw, he could ask her to send him a link to where he could find it, then request Alexa to order it based on the texted information. “Turning texted, emailed, or Facebook product recommendations into quick purchases would be great,” he said.
- Make shopping commitment-free. Voice user interfaces (VUIs) have benefited from major advances in speech recognition (a computer’s ability to translate speech into written text) and natural language processing, or NLP (its ability to understand speech). But advances don’t equal perfection, and our voice assistants don't always understand everything we say (especially if users happen to speak with an accent). When it comes to shopping, users are understandably reluctant to shop through a device that could misunderstand them, leading to a botched order, money lost, and an annoying returns process. But they’d probably overcome their reservations if pushing the buy button were risk-free. Disrupting brands selling everything from mattresses (Casper) to eyeglasses (Warby Parker) have won brand loyalists by letting customers try out their products before officially purchasing them. Amazon and other brands leveraging the voice platform could do the same with orders placed over an Echo.
Adding a screen to the Echo is a sure-fire strategy for getting users to shop on their device as they would on a laptop or tablet. But finding a way to get people to shop without visual input is the real challenge. The technology itself will eventually evolve. Sentiment analysis will allow a virtual assistant to modify its responses based on a user’s emotional tones. As NLP evolves, the AI will make fewer comprehension mistakes. But it’s not too early to get users warmed up to the concept of shopping through voice now. After all, in the early days of online shopping, there were plenty of naysayers who balked at the idea of buying something that they couldn’t touch or feel in a traditional retail setting. They eventually changed their minds, too.
Curie research team: Jenny Clark, Elsa Kaminsky, Mohenna Sarkar, Yashoda Sampath.
Many thanks to other collaborators: Luke Cheng, Kate Falanga, Tiago Freitas, Meghan Graham, Michael Horn, Jacque Jordan, Missy Kelley, Kevin Lam, Adam Lauria, Thomas Prommer, and Rachel West.