When Siri launched two years ago, Apple ran ads featuring John Malkovich going about his day with Siri acting as his trusted assistant. Siri makes his coffee. Siri plans a romantic dinner. Who wouldn’t want Siri in their life?
We all know that in its first iteration, Siri - like the two decades of mediocre voice recognition software leading up to it - just didn’t live up to the hype. But with iOS 7 Siri is getting better. Android’s voice command feature is also getting better. This is important, because while it may be awkward right now, voice is a far more natural manner for humans to interact with computers for many form factors and use cases than type and touch. As voice recognition starts to live up to its promise, businesses will have to adapt themselves for a new form of digital interaction that most are completely unprepared for. It's one that will potentially be more challenging than the social and mobile revolutions that have shaken up so many business models to date.
"For businesses struggling with the shift from desktop to mobile, wait until you have to evolve your interaction model from click, point, swipe, and type to programmatically responding to spoken commands."
Take Google Glass as an example. Much of the conversation around Glass has been about whether people will adopt this form of wearable technology, what kind of apps might be created for it, and how the platform might be monetized. But perhaps the most important implication of Google Glass is the introduction of a completely new computing platform where the primary interface is voice. As a Glass wearer, I can't type or click on the device; all I can do is press a button on the side of my glasses and tell it what to do. And while Glass certainly has a lot of challenges to overcome, I’d argue based on my own limited experimentation that the voice commands--in the constrained universe in which Glass is designed to function--are not one of them.
As wearable computers become more common, voice based interaction will become necessary given the lack of input options in small form factors. We can't click on or type into our glasses and we can't do much more on an iWatch, pendant, armband, or whatever gadget comes out next. Voice is also a natural fit - and certainly the safest interface - for using the increasingly sophisticated computers with four wheels we used to call cars.
All this brings us to your business: Is your company voice-enabled? Will your business be ready when users begin trying to interact with your company using voice commands issued to their phone, wearable device, or their car? Are you ready for what happens when Siri and Google open their APIs and anything can be voice-enabled? It will be an enormous shift and for businesses already struggling with the shift from desktop to mobile, wait until you have to evolve your interaction model from click, point, swipe, and type to programmatically responding to spoken commands.
Companies looking to make the shift to voice should think about several key things:
- Relying on clunky, existing call center technology that routes users away from talking to a human operator to navigating endless menus and automated voice commands will not lead to success. Consumers hate them. Instead, companies should learn from prior mistakes and embrace the reality of why voice interfaces are appealing: the promise of fast and easy interaction that’s not possible with other methods of data input. Any voice solution must meet this promise.
- Voice enablement is really a search problem. If we assume Apple, Google, or even Microsoft (thanks to Sync, for cars), will be the ones who translate audio to correct text, the job of the company becomes defining the set of information, actions performed, or follow-up requests that are provided to the user based on a given text input. The good news is this is a manageable problem: In the worst case, we can solve most of it by making sure the most common search queries result in good responses. The bad news is most companies do a terrible job with search. Pick a large company's website, search with the site's own search tool, and see how bad the results are. Companies should stop treating search as just another feature on the checklist, but instead as something that must solve real user problems. Think about users and their primary needs and make sure their search experience is exceptional.
- Companies that win in voice will do more than just provide relevant information in response to user queries. The companies that win will figure out how to actually execute tasks on behalf of their users. For example, if I want to make a reservation on Open Table, it would be maddening to go through several steps using voice control (it’s frustrating enough doing it now on a mobile app). Instead, I want to just say, “make reservations at Olive Garden tonight at 7” and have it happen automatically. Better yet, I should just be able to say, "book reservations for tonight," and it knows based on my calendar and prior preferences the time and restaurant that's right for me.
Ultimately, the challenges in shifting to voice are a subset of the larger problem facing companies, which is the explosion of new digital interfaces and devices constantly entering the market. Companies have to think about their technology ecosystems flexibly and be able to nimbly respond to new platforms as they emerge without compromising the value created for the user. At a minimum, this means wrapping a web services layer around all corporate functions and data, so a new front-end experience can be rapidly produced without heavy engineering overhead.
A Google Glass App becomes an urgent need? No problem. Need a presence on that hot new social network? Done. Without an engineering structure and culture of nimbleness, voice--or any future interface--will continue to be very difficult for companies to make happen.
*This article was originally published in Fast Company.