How to Win (+ Lose) in a Voice-First World
Matt Maher
Founder, M7 Innovations | Tech Leader | Speaker | Advisory Board: CHANEL, Glimpse Group (NASDAQ:VRAR) | Featured in Vogue Business, Barron's, Forbes, Quartz, Digiday, Adweek+
The quest for knowledge and how we search for it is a tale as old as time. In haphazardly placed chronological order: cave paintings, orated stories, written word, printed books, libraries, encyclopedias, AOL on a 56k modem, Alta Vista, Google. Take a second to think about the magnitude of this reality: we have the totality of information at our fingertips, delivered in real-time.
Fingertips, however, are so 2010.
Now think about this: within three years, the majority of all search traffic globally will come from a technology that didn’t even exist in 2010.
That technology is voice, powered by natural language understanding: the ability for a machine to not just hear what we say, but understand what we mean. The rate of adoption and usage of vocal assistants is happening at such a breakneck speed that only the smartest, most forward-thinking marketers are aware of this impending reality...
‘ Share of voice’ will not exist in our voice-first world; it’s winner take all.
Think about the user experience. In a Google Search, a user can type their query, hit enter and comfortably scroll through a webpage to select one of the fifteen to twenty options. In voice, statistics suggest only three to four options should be read aloud before a user will feel overwhelmed and is likely to forget. Furthermore, Alexa and Google Home can now distinguish between voices, enabling them to personalize experiences. Eventually vocal assistants won’t need to present you choices; they will already know what you like from past behavior. That means once a user has a positive voice experience with brand X and puts it into their preferred set, there’s no room for brand Y or Z to disrupt that flow. The scariest part for the doubtful, deep-pocketed players? As of now, you can’t buy your way to the top of a voice search.
Marketers think in funnels, attempting to ‘pin the tail on the consumer journey’ with a voice strategy and tactics. I would argue that voice is such an expansive, dynamic and utilitarian medium, it demands a separate funnel that exists in and of itself. Let’s take a look:
I’ll dive into these five steps of the voice funnel and pair each with a ‘Did You Know’ fact. Use these facts as food for thought. They are meant to help you think outside the box and consider techniques to elevate a user’s experience. Before I jump in, let’s answer the inevitable question you're bound to hear in your next meeting. “Maybe voice is just a fad. Can it even scale?”
Yes. Yes. One more ‘yes’ for good measure. No POV needed here, just a few stats:
- There are over 50 billion voice searches a month, that’s 10 percent of the entire search market. (Source: MindMeld)
- 20 percent of Android mobile search traffic occurs through voice. (Source: Google)
- There will be 4 billion digital assistant capable devices in market by the end of this year, rising to 7 billion by 2020. (Source: IHS Markit)
- By 2020, 50 percent of all searches will be done through voice. (Source: ComScore)
Step 1: Search
Why Your Current SEO Strategy Will Not Work for VSEO
The way we speak in a conversation is vastly different from the way we type, and even more divergent from how we search. Data from Google suggests the majority of searches are done in fractured, staccato spurts. Voice is more conversational, natural and structured. Here are a few examples of possible differences:
Best practices in traditional SEO strategies entail spending the most money on high-value keywords, weighting value by overall usage and not necessarily context. Voice requires the opposite approach: context is everything. This requires reducing the importance of going all-in on single keywords and instead developing a long-tail strategy for lengthy phrases and even whole sentences. A tip for starting: focus on developing strong FAQs. ‘Frequently Asked Questions’ are just that: questions in a conversational form that are commonly requested by users.
Did You Know: Amazon and Google are the key players in the vocal assistant market, with Amazon dominating market share: 71 percent to Google’s 23 percent. AVS (Alexa Voice Service) has inked deals with Ford, Uber, LG, Whirlpool, ADT Security and Fitbit. That means Alexa will help you drive your car, wash your clothes, do your dishes, protect your house, and check your caloric intake. It’s way bigger than just the ‘device in the living room.'
Step 2: Discover
Making Sure Your Brand’s Experience is in the Right Place at the Right Time
Amazon has skills. Google has actions. Siri calls them apps. It sounds confusing, but in reality, they all do the same thing. I’ll explain what they are and promise to only use the term “skill” moving forward. At its core, a skill enhances the capabilities of a vocal assistant. If you pull your brand-new Echo out of the box and ask Alexa to order you a pizza, she won’t be able to do it. If you enable the Dominos skill, which possesses your account credentials, pizza preferences, credit card information and closest store location, Alexa will be able to get you that pizza pronto. The same goes for ordering a taxi (Uber), making the perfect margarita (Patron), or testing your IQ (Jeopardy).
Amazon boasts over 20,000 skills; Google has around 570. You might be thinking, “25,000+ skills, how can they all be useful?” They’re not! I say this from experience: I’ve built a few that are live on Amazon and can confirm their absurdity. Take my Friday Night Lights skill as an example: it’s a personality test to discover which character from FNL you’re most like. It’s ridiculous, only fun once or twice, and I unfairly inserted multiple roads that lead to Kyle Chandler (because who doesn’t want the confidence boost of Alexa comparing your moral fiber to that of Coach Taylor’s). If you don’t consider that useless, you’ll also find 107 unique ‘Cat Fact’ skills now available. I’m not knocking Amazon, because I think what they did was quite genius: enabling more than just hardcore developers to dive in and build an experience on Alexa. That being said, there are a countless number of high-quality, interactive, and extremely useful skills now available that take Alexa’s capabilities to new levels. The question then becomes, how can your skill stand out from the rest?
First and foremost, a skill should serve a particular purpose, and it should serve it damn well. Many brands I’ve seen try to do too much. Let’s use a category example. If you’re in the travel vertical, does a single skill really need to make flight recommendations, reserve a hotel, hail a cab, check you in at the airport, order Gogo Wifi and book tours? No, for so many reasons. Here are three:
- Hailing a cab through your skill won’t be a better experience than a trusted skill like Uber or Lyft.
- When a user asks Alexa for help with a specific task like booking a flight, and that’s just one of the ten offerings your skill possesses, you won’t find yourself at the top of the recommended list.
- Focusing on so many options means you’re not focused solely on one or two, so you’ll never be the best at the function that matters most to your bottom line.
The key is to align your skill’s functionality to a contextually relevant search query a user might request. This is a chicken-and-egg situation as you must consider both the potential questions and pertinence of your skill as the solution.
Did You Know: Amazon and Google have separate algorithms that determine which skill they recommend to a user. I won’t go too deep into the weeds, but I will say getting your brand discovered might require two separate strategies that cater to each of these two platforms. I won’t pick a favorite on this front, just give the basic facts: Google has spent almost twenty years mastering search. Amazon has spent over twenty years mastering e-commerce. Their capabilities in the vocal assistant space are industry leading, but not identical. Think about where your brand resides in the spectrum of reliance on these two behemoths.
Step 3: Enable
You Never Get a Second Chance to Make a First Impression
You did it. A user enabled your skill and is about to test it out for the first time. The tension is palpable. One wrong move or misunderstood word could land you in the, “sorry I didn’t get that” abyss, that never-visited graveyard where skills go to die. Dramatics aside, the first impression you make on a user is the most important. Different skills will have different strategies, but success boils down to these two factors:
- Did the skill do what the user expected it to do? (Form and Function)
- Did the skill do something pleasantly unexpected? (Surprise and Delight)
The first is quite obvious: deliver what was promised. If I enabled your skill because I was told you could book a flight, and the first thing I hear when it starts is a hot hotel deal in Cancun, I’m not going to be happy. The emotional stakes are higher in voice than they are in traditional search. It may be slightly uncomfortable to hear, but there is a trust and empathy when speaking to a device. Test it: start saying mean things to Alexa, listen to how she reacts. Now type those same phrases into a Google search bar. Which felt worse? The developers of these devices work around the clock to make these conversations fluid, authentic and useful. A vocal assistant like Alexa is the surrogate host of your skill. Therefore, if what you’ve created is poorly made, misleading, or a bad experience, Alexa is technically the one to blame. A customer obsessed company like Amazon will never stand for that type of negative user experience, hence why their certification process, reliance on user feedback and resulting algorithms are finely tuned to weed out the crap. Be great or don’t be at all.
The second factor, surprise and delight, separates the good skills from the great. When it comes to artificial intelligence, be it a vocal assistant or chatbot, there’s always an expectation that it will accomplish a specific task. Jeopardy’s skill will read us trivia. Uber’s will book us a car. Dominos’s will get us pizza. It’s the unknown - the delightful “error response” or surprising left turn the conversation may take that raises our eyebrows, takes us off guard, and makes us think, “did they really just say that?” There are great examples out there, but without calling attention to any single brand, here are a few theoretical examples:
As you can see, this is where your brand’s personality and ethos can shine. The more emotionally engaging you make the experience, the better your chances of increasing interactivity and time spent with your skill.
Did You Know: It’s possible to use voices other than Alexa and Google Home’s assistant to host your skill. This offers the opportunity to truly speak in your brand’s voice, using whichever talent you deem worthy. Granted, you will have to get into the studio, model a voice and prepare for a plethora of possible answers, but it’s doable. An Alexa skill I built for BORN AI, an educational roundtable that teaches you about artificial intelligence, is hosted by four voices created from Amazon’s Polly, their text-to-speech program. I won’t pull the hood up on how to actually execute this feat, but send me a note if you’re curious.
Step 4: Use
Don’t be The Milli Vanilli of Voice
That was a one-hit-wonder joke. Sadly, many skills offer minimal value and limited variation in their experience when used multiple times. You must first decide what the functionality and purpose will be. If you’re an entertainment brand armed with loads of content, copywriters and creative minds, you’re primed to provide a novel experience every time your skill is accessed. If you’re a function-focused brand, adding value by performing one task extremely well, you may assume there’s no room for creativity. Good news: there’s always room for creativity. Let’s use the latter example. How can we spice up a skill that only performs task X? It starts with envisioning the back-and-forth conversation, something we’ll call the dialogic flow. This flow should be fluid, adaptable and ultimately feel human – like a normal conversation. So how do you do that with a piece of artificial intelligence?
First, you’ll want to make sure you’re capturing each session, writing and storing that data to your preferred cloud, and using it to inform the conversational agent of the actions the user has taken and responses the agent had. This will ensure no duplication of responses until they’ve run through all the possible iterations you have programmed in. This may sound complicated, but it doesn’t have to be if you take the right approach. It’s a delicate mix of science and art. The science requires a good developer to code in the business logic and back-end functionality (with one-stop shops like AWS, this can all happen in the same place). The art requires crafting all the possible variations of responses to keep the conversation fluid and engaging. Remember, the more a skill is tested and used, the more data you will generate. This will provide you insights to properly alter the conversation and responses accordingly. It’s an iterative process, and it won’t be perfect on the first try. The second thing you’ll want to consider is something I’ll call the maturity map of your skill. It handles task X now, but is there an opportunity to accomplish task Y or Z if there were a visual touch screen? The Amazon Echo Show and Spot now provide this functionality, so what would your new user experience look like? When should you launch it? How would you introduce your current users to the new components of your skill? These are all creative conversations that need to occur before a developer lays a finger on a keyboard.
That was a lot to digest, so let’s sum it up with a simple sentence: Value and variety go hand-in-hand when developing a successful skill. Not only will this method sustain a user’s attention, it will have them coming back for more.
Did You Know: Even though Amazon and Google protect the raw utterance data (everything spoken) for general interactions with their assistants, once a user is in-skill, it’s possible to capture the spoken data, responses and entire dialogic flows. This involves extra coding and tagging, but companies like Dashbot.io, Voicelabs and Convessa can help.
Step 5: Convert
Examining the Path-to-(Voice)-Purchase
Purchasing through voice isn’t the future, it’s now. A recent Walker-Sands report claims one-in-every-three consumers plan on making a voice purchase within the next year, while 47 percent of millennials have already done so. This isn’t limited to Alexa ordering from Amazon, or Google Home ordering from Target or Walmart - brand skills that take advantage of account linking can also handle transactions for both products and services. Examples are as follows: Expedia can book you a rental car, Uber can book you a ride, and Dominos can book you a date with that pepperoni pie. To ensure privacy, a skill must be properly linked to these respective accounts before a vocal assistant is capable of making a monetary transaction.
Conversion doesn’t have to be transactional. Perhaps driving website traffic is your main business goal, so why not utilize the response cards in the accompanying mobile app to drive users to your site? If brand awareness is paramount, enable linking to social channels and gently push users to share their results or content with their friends. The beauty of creating your own skill is the ability to adapt the user experience to align with your business goals. My warning: don’t ask too much of a user without building trust and providing value first. Once conversions have occurred, build on the learnings and continue to customize the experience for these users as they subsequently visit.
Did You Know: Amazon encourages voice-shopping by offering exclusive discounts if you purchase through an Echo device? Consider a similar value exchange when developing your skill. It’s the tried-and-true marketing strategy - if you want your customer to adopt a new behavior and engage with you, make it worth their while.
Final Advice: Don’t Lose Your Voice
Brands spend so much time planning their advertising for the year - carefully preparing for television seasons, tent pole events, and thoughtfully predicting unpredictable moments. As I stated before, there are clear winners and losers when it comes to voice, so the time to start thinking about voice isn’t tomorrow morning, it’s right now.
Your voice strategy doesn’t have to be baked in an underground bunker for six months before you’re ready to hit the market. We live in the world of updates and iterations, not yearly product releases wrapped in cellophane. Think of voice as a vehicle that will take your business from point A to point B. Start with building a skateboard. It has four wheels and will require some pushing, but you’ll start moving. Then try building a bicycle; it will give you more speed and better agility. Then construct a go-kart. Then a car. Then an airplane. Then a jet.
If you decide you don’t want a skateboard and begin the twelve-month process to build your jet, that’s twelve months on the shelf. That is a long enough time for your competitors to test and launch, fail, learn, try again, fail even better, try again and succeed. They will have accomplished all of this while simultaneously climbing to the summit of the algorithms that will one day be the deciding factor of who gets picked, and who gets forgotten.
For those still timid and unsure where to start, just get out there and start asking questions, learning and breaking things. I'll leave you with a quote from American inventor Charles Kettering, who was also predicting what the 20's would hold.
"99 percent of success is built on failure."
Thanks for reading. Please reach out if you have any questions, comments or concerns on how to get started. I’m happy to connect and dive deeper into any of the technical or creative aspects I mentioned above.
27 years helping persons blind, low vision and visually impaired. | CEO - Accessibility dot Net | Low Vision Rehab | Discovered - TechnologyAirtime.com
7 年Thank you for sharing Matthew!
Partnering with Activities Directors and Coordinators in Senior Living Communities to leverage cutting-edge communication technology and Generative AI and to deliver on their Digital Inclusion mission.
7 年Let's talk when monetization is introduced and people can make money building consumer facing skills. Building great conversations requires a LOT of work and a lot of money. The rampant mediocrity that we see now in Alexa skills (and Goole Actions, and Cortana skills) is a reflection of that, and mainly that.....