登录查看更多内容

9 flaws that make voice assistants fundamentally wrong

Ramesh Panuganty

Founder & CEO of 4 startups (all acquired). Anticipated tech trends, crafted solutions, and launched businesses ahead of broad adoption. @HumanTechOS

发布日期: 2016年11月30日

There’s a lot of buzz on the ‘Virtual Assistants’ from Google Home and Amazon Echo. I have been studying both the devices for a good amount of time now, since I wanted to see how they handle the user experience, and must say that I was pretty disappointed.

Both Google Home & Amazon Echo are built on some fundamentally flawed designs, detailed below. Note that I am not comparing the product specs, speaker technology or the aesthetics of these devices — but purely talking about the underlying features of the virtual assistance, and whether the device is designed to truly understand and answer the user’s question or not. My observations are common to both the devices, and I am using the term “the device” to refer to either or both of the devices in this write-up.

Flaw #1: Applying search principles is wrong

The device is built on an assumption that was originally constructed by our search engines. It answers the same way a search engine answers a query; however, it returns just the first answer. It neither tries to validate the user’s question for completeness, nor does any disambiguation on the question itself. For example, a question like “tell me about George Bush” gets responded with information on the Junior Bush because his search ranking is higher than his father’s (probably no one did the SEO for the Senior Bush).

How can the device assume of a particular George Bush? Why doesn’t it ask the users which George Bush they are referring to? This minimalist approach is just not how a teacher answers a student’s question, or a parent would answer a child's question.

This is a fundamental design flaw – a computer’s search on Google might return 12 results on Page 1 and the user gets to pick answers with a scroll bar. There is no scroll bar in the voice assistant – and there is no way to know the alternate possibilities!

The device can’t assume that the first answer from the search is the right answer, and this is totally wrong. There needs to be an interaction before answering the questions, or a communication on alternatives along with the current answer.

Hey Google & Amazon, please don’t create a falsified world for the next generation who might pretty much assume that these devices are answering correctly to a question.

Flaw #2: Inability to narrow down the context

While the device tries to get into answering mode very quickly, it doesn’t tell what it is answering. This is a different problem compared to the previous one, because the context of the query object can vary.

For example, for the question “tell about Gandhi”, the device quickly responds with “Gandhi was released in India on 30 November 1982, and in the United States on 6 December. It was nominated for Academy Awards in eleven categories, winning eight.” It is so unfortunate that ‘Gandhi’ is assumed as a movie title when he is more than a person, and is often treated as a common noun.

Why did my question get answered as if the question of ‘Gandhi’ was all about a movie? I was asking about a person. And the device doesn’t even rephrase the question for me of how it is interpreting?

It would have been great to start the response with “Gandhi is a movie and was released in…”. Otherwise, it is a bad idea to assume that the question was above a movie and not even tell the user about it. It is again a fundamental problem as the underlying technology is just indexing all of Wikipedia, local businesses, restaurants and some top search results, and just throwing the top most (SEO’ed) answer. The user just can’t go by the answers and be fooled!

Another example for the same flaw… try asking “when did Iron Man release”, and the device starts responding when the first sequel of Iron Man movie was released. Hmm, I didn’t say that I was referring to the movie, I could be thinking about the book “Iron Man”, or the movie’s sequel 1 or sequel 2. Even if it is a movie, why the sequel 1, and not sequel 2?

Flaw #3: Inability to articulate words being ignored

I have seen many scenarios where there is no way to ask a question. For example, “what are the nearby pizza joints” returns “Pizza my heart, Big Apple Pizza, Amici’s”, and not “Pizza Hut, Round Table Pizza”. I wasn’t sure why did it not give details of Pizza Hut which is far closer to my home than the 3 restaurants that it responded with. I tried asking “what are the nearby pizza fast food restaurants” and it responds with “KFC, Subway and McDonald”. Really? What just went wrong?

Users can ask very direct questions or use overloaded words. The device can’t just ignore some words and answer the rest. If ignoring words – say what is being ignored.

Users ask give more details either to confirm what they are asking, or they don’t know how to ask. Clarify with the user on how to ask. I could have simply got an answer to the above question by searching for “pizza” in Google Maps, but asking for “pizza” in Google Home takes it for a spin.

In this particular case, I couldn’t figure out why McDonald came in the answers because it doesn’t sell any pizza product at all, and the user doesn’t have a way to respond for “search instead for” questions.

Flaw #4: Inability to understand comprehensively

The context of the question is never understood comprehensively. For example, I asked the question “how many songs do I have in my music library”. It gets responded with “shuffling your music” and then one of the songs starts playing.

Oops, my context of the question was an inquiry on my library and not to play the song.

It is very evident that there is a rush to do something for every question, and not really understand the question itself. What sort of assumptions do they make on the user while building these products? Is it an adult or a child; a tech savvy or a non-tech savvy; an active or a lazy person? I would like to understand the assumptions on the emotional part of the user that either Google or Amazon has built this for.

Even in English, depending on the country - people call “movie” as a “cinema”, or a “film”, or a “picture” or even a “show”. As of now, because there is no communication, this question doesn’t arise, but if in future the device communicates back to the user - it would be interesting to see how its product management has addressed this.

Flaw #5: Disregard to meaning beyond keywords

The device doesn’t go beyond keywords and ignores the complete meaning of the question. For example, try “tell the names of 5 American presidents”, and all the device responds with one president’s name (thankfully, as ‘Barrack Obama’). The device understands what an American president is, but not 5 of them. Similar example can be “tell me 5 jokes” which just returns only 1 joke.

Sorry, but there is more to understanding of the question than just some of the keywords. The right answer for this question should have been “I can tell one joke, and not 5. Here is the joke…”

Too bad guys – there is a lot more maturity that is expected from this device. Instead of looking my example in isolation, look at the big picture – the device understands only the keywords and not beyond them. On a different note, the question "tell the name of Russian president" is not understood :-

Flaw #6: Inconsistent messages

I was first intrigued when I got different status messages for the not-understood questions. I am not blaming the device for not understanding the question itself, but disappointed about how it handled the responses when it did not understand.

Try to ask the same question that is not expected to be understood at all multiple time, like “what is John Martin’s phone number” and you can see that answers randomly vary from “My apologies, I don’t understand”, or “Sorry, I can’t help with that yet”, or “Hmm, something went wrong”, to “Sorry, I don’t know how to help with that yet”, or “Sorry, I am not sure how to help with that yet. I am still learning”, or “Hmm, I wasn’t able to understand the question I heard”.

Too bad, Google & Amazon. Just by changing the apologetic messages randomly doesn’t make the user believe in you. Stay consistent.

I would have loved if the device responded in this case as “Sorry, personal phone numbers are not supported. Only business listings are supported at this point.”

For any enterprise product, it is almost mandatory to document how to handle an error or an exception message. In the consumer world, companies assume that not having a user guide is a fashion and they just take customers for a ride. These devices are not simple enough and need trouble-shooting for the user to know how to ask questions better.

Don’t confuse the users, and stay consistent.

Flaw #7: Not designed for personal data

Every question is assumed to be about an Internet data (public data) with no personalization whatsoever. The public indexed data such as Wikipedia, movies, local businesses, music albums, weather, or news, far dominate the personal data in these devices and they are just not designed to answer questions on your own personal data. For example, ask “tell me about Jack Reacher” and the device assumes it is a movie.

Oops, what about the Jack Reacher who is my colleague? Even if the device ignores my colleague, what about the 100+ people with this name on LinkedIn.

If all that the user is looking for a voice response to a google.com search box, this device is not worth of even 1 cent! It should read my personal data for my questions.

Flaw #8: Forced actions

Anytime I ask about a movie or a book in Alexa like “tell about Harry Potter”, it answers from wiki, and continues to either order the book/DVD for me (forcing me with a purchase), or says that it is not in my library.

Don’t keep trying to upsell other services with voice assistants. If all I want is an easier way to order (I am not that lazy), I will use my mobile phone.

Treat the voice assistant as an independent product, that the customer has paid money for, and respect the customer. Don’t try to bully or fool the customer.

Flaw #9: Lack of interactivity on capabilities

Can the device features be inquired? For example, a question like “what’s your current volume” doesn’t get understood at all, while “increase the volume” or “decrease the volume” gets understood. Similar failures with questions like “how many alarms can you set?” or “what music sources do you support?”

Too bad, I am communicating with an audio device and I can’t even calibrate. I am communicating with an alarm device, and I can’t inquire on the currently set alarms.

When the skills or the vocabulary give only a couple of choices, the device makes it difficult to interact for the user.

In the end, I felt these voice assistants are more like remote controls with lots and lots buttons, which you can press by voice instead of using a tactile input. You need to remember all the buttons though :-)

[disclaimer: These observations are noted at the time of writing this article. I would be glad to be proven wrong in the future.]

Akshatha Bhat

Enterprise Account Manager-High Tech Vertical | ISB

7 年

Excellent comments

Ravi Kotra

Data Modernization with the Power of Azure Cloud and AI

7 年

Awesome analysis Ramesh. couple of thoughts; (1) I believe we are guinea pigs - Amazon and Google know these products are not ready and they sold these products to consumers and working on improving them based on how customers are using them (2) Most of the flaws you listed exist in Search (e.g. google.com) as well - lot of times you have to rephrase the question to get what you are looking for or the answer you are looking for may be on page 923. Search does not validate context / intent - you have keep refining the query until you narrow down the results (3) I am at least relieved that machines and AI are not going to take my job anytime soon :)

Kapil Kansal

Principal PM at Amazon | ex-Square

7 年

Good analysis ramesh, esp loved the first 2 points.

Shyam Pillai

Product Management

8 年

Good analysis Ramesh Panuganty

Krishna Kumar Subramaniam

Windows Engineering

8 年

I think the design perceives questions and the design is boxing the users thoughts to be only the popular or perceived popular. I completely agree to what you are saying

查看更多评论

要查看或添加评论，请登录

Ramesh Panuganty的更多文章

UX of AI - Reality Distortion Unraveled

2024年11月4日

UX of AI - Reality Distortion Unraveled

If you've seen discussions suggesting AI products can do away with a traditional UI, replaced by a simple search box…

1 条评论
AI: Beyond the Hype, Unveiling Real Opportunities.

2024年9月25日

AI: Beyond the Hype, Unveiling Real Opportunities.

I always find it amusing to hear claims that AI surpasses the invention of electricity. Social media is a realm where…

1 条评论
Is Customer No Longer the God?

2024年8月27日

Is Customer No Longer the God?

Gone are the days when every businesses used to say, "The customer is our God." Today, many companies view customer…

1 条评论
The Challenges Behind Snowflake's Decline: Missteps, Misjudgments, and Missed Opportunities

2024年8月25日

The Challenges Behind Snowflake's Decline: Missteps, Misjudgments, and Missed Opportunities

Snowflake has recently been making headlines, but unfortunately, for all the wrong reasons. While discussions around…

5 条评论
Generative AI - 6 things to know before falling into the marketing trap

2023年1月24日

Generative AI - 6 things to know before falling into the marketing trap

Move on Metaverse, it’s 2023 and Generative AI has already hijacked most social media conversations (and perhaps some…

4 条评论
Move over Google, Gen Z users prefer immersive audio-visuals

2022年7月18日

Move over Google, Gen Z users prefer immersive audio-visuals

“Just Google it” is no longer the only mantra for finding information. Young people are looking for new ways and…

1 条评论
Enterprise search is 10x complex than Google search!

2022年5月3日

Enterprise search is 10x complex than Google search!

Empowering an enterprise with the convenience of a Google-like search is the holy grail in enterprises. A simple search…
Would OpenAI’s GPT-3 ever work for a direct enterprise use?

2021年12月21日

Would OpenAI’s GPT-3 ever work for a direct enterprise use?

OpenAI’s GPT-3 offers different features to build custom applications. However, it still remains to be seen whether…

1 条评论
When less makes you more productive

2019年7月9日

When less makes you more productive

Our yoga instructor pointed at us and said, “Stretch your hands to the farthest position, take a breathe, and then…

3 条评论
New era in the fast-growing analytics market

2019年6月11日

New era in the fast-growing analytics market

The last few days have been supercharged with excitement of acquisitions in the fast-growing analytics market. While…

See all articles

9 flaws that make voice assistants fundamentally wrong

Ramesh Panuganty

Founder & CEO of 4 startups (all acquired). Anticipated tech trends, crafted solutions, and launched businesses ahead of broad adoption. @HumanTechOS

Flaw #1: Applying search principles is wrong

Flaw #2: Inability to narrow down the context

Flaw #3: Inability to articulate words being ignored

Flaw #4: Inability to understand comprehensively

Flaw #5: Disregard to meaning beyond keywords

Flaw #6: Inconsistent messages

Flaw #7: Not designed for personal data

Flaw #8: Forced actions

Flaw #9: Lack of interactivity on capabilities

Ramesh Panuganty的更多文章

社区洞察

其他会员也浏览了

Nobody wants to talk to computers

Google I/O 2023: Unfolding the Future of Tech and Predictions for Developers and Tech Enthusiasts

Is Google Dying?

Hey Siri, Tell Me Everything I Need to Know About Voice Searches

Why Amazon & Google Are Fighting to Take over the Voice-First World

Quick notes from Google I/O 2021

Understanding the Why of your Customers

Why the Next-Generation Google Assistant Could Be a Game Changer for Google

Google to soon bring Gemini Nano to Chrome desktop

Flaw #1: Applying search principles is wrong

Flaw #2: Inability to narrow down the context

Flaw #3: Inability to articulate words being ignored

Flaw #4: Inability to understand comprehensively

Flaw #5: Disregard to meaning beyond keywords

Flaw #6: Inconsistent messages

Flaw #7: Not designed for personal data

Flaw #8: Forced actions

Flaw #9: Lack of interactivity on capabilities

Ramesh Panuganty的更多文章

UX of AI - Reality Distortion Unraveled

AI: Beyond the Hype, Unveiling Real Opportunities.

Is Customer No Longer the God?

The Challenges Behind Snowflake's Decline: Missteps, Misjudgments, and Missed Opportunities

Generative AI - 6 things to know before falling into the marketing trap

Move over Google, Gen Z users prefer immersive audio-visuals

Enterprise search is 10x complex than Google search!

Would OpenAI’s GPT-3 ever work for a direct enterprise use?

When less makes you more productive

New era in the fast-growing analytics market

社区洞察

其他会员也浏览了

Nobody wants to talk to computers

Google I/O 2023: Unfolding the Future of Tech and Predictions for Developers and Tech Enthusiasts

Is Google Dying?

Hey Siri, Tell Me Everything I Need to Know About Voice Searches

Why Amazon & Google Are Fighting to Take over the Voice-First World

Quick notes from Google I/O 2021

Understanding the Why of your Customers

Why the Next-Generation Google Assistant Could Be a Game Changer for Google

Google to soon bring Gemini Nano to Chrome desktop