Voice Interface Design
Build a human dialogue with the machine
Each of us came across voice interfaces. A robot responding that red Ford will pick you up, an elevator naming the floor, a navigator telling that you should now turn right - someone should think through and create this words, right?! This is a new direction for interface designers - the design of voice interfaces.
What is VUI
Voice interfaces (VUI - voice-user interface) - this is the evolution of interaction, which frees hands and eyes, simplifies the input or receipt of information. For example, when we drive a car or perform a surgical operation, and at this moment we want to know how old Demi Moore is.
In the past few years, voice interaction has been developing by leaps and bounds. Already 20% of all search queries on Google on mobile devices are done using voice. According to Gartner, by 2020, 30% of site visits will occur without a screen. You can find out the weather forecast, turn on the lights in the living room or order pizza right now. In the future, the possibilities are almost limitless.
Voice Interface Components
What characterizes the voice interface and what are differences from the usual visual? Specialists from the Nielsen Norman Group identified five basic technologies of voice user interface:
- Voice input: requests are made by voice, and not entered via the keyboard or graphic elements of the screen interface.
- Natural language: Users should not be limited to using a specific vocabulary or computer-optimized dictionary, but can structure input by any means, as if it were a conversation with a person.
- Voice output: information is pronounced by voice, not displayed on the screen.
- Intellectual interpretation: For a true understanding of user requests, a VUI should use additional information, such as the context of use or actions that the user has performed before.
- Facilitation: The VUI performs the actions necessary to complete the user's task that the user did not request.
Not all voice interfaces use all five points at the same time. For example, virtual keyboards on mobile devices offer only language input, voice assistants sometimes display information on the screen instead of speaking with voice.
With the integration of all five, we get interactions with two significant advantages:
- the ability to formulate goals in a native language. There is no need to learn the interface and click the buttons.
- the ability to predict the user's goals, to offer them based on contextual information or previous actions.
Voice assistants
The combination of all five basic technologies and their integration is a prerequisite for creating an interface that does not require any input at all. Although we are still very far from the design of the interface that reads people's thoughts, but voice assistants, primarily Alexa, Google Assistant, and Siri, are the first step towards this.
Almost all of us at least once already used voice assistants. At least those that are built into our smartphones. We have some idea of what it is and what it may generally be useful for. A study from the same Nielsen Norman Group revealed the current state of affairs in the market of assistants, the disadvantages and advantages of VUI in their modern incarnation. Further, some results of the study.
Usability
The study showed that voice assistants poorly meet all five criteria of voice interfaces and their integration. The level of usability is close to useless even in at least some complex interactions. Contrary to the assumptions about human-oriented design, users have to think about when the voice assistant will be useful and when it is better to refuse to use it, to select the wording of the requests. And this is despite the fact that the initial message was that the computer should adapt to the person, and not vice versa.
Most users who participated in the study of voice assistants, noted that they use them mainly in two situations:
- when hands are busy, for example, while driving or cooking;
- when they think that asking a voice will be faster than typing it from the keyboard and reading the answer.
Almost everyone clearly understand the capabilities of assistants and often do not use them for complex queries, preferring web search. Some people think that they can succeed with a complex task, but for that they need to simplify requests and think about their wording. The majority believes that thinking about how to properly ask a question is not worth the effort.
Voice Interface Design
To solve all the problems of VUI in the current implementation, it is important to find the right approach in development. Voice control is a verbal process, communication with the machine. In a good voice interface, this communication should be as natural as with a person. In designing such systems, there is much more psychology, an understanding of the characteristics of human thinking.
What should be considered when developing VUI and what principles to adhere to:
Trust
Trust is not a technical issue, but if it is not solved, the rest of the work will be done in vain. Without trust, the user simply will not use the VUI to perform at least some significant tasks. First, we learn how the system copes, and then we begin to delegate tasks to it.
It is not easy to make such an interface that the user would trust even in such a simple task as setting an alarm clock. It is one thing to overtake Saturday's breakfast, and another is a flight by plane. If a person not sure that the system can make a mistake, then he simply won’t use it.
Invisible interface
Invisibility is the fundamental difference between the voice interface. We do not see interface elements, where or at what step we are.
Each user has his own mental model that answers the question about the capabilities of the system. It essentially replaces these visual components of the interface. Each system response to user actions changes the mental model and, in order for the VUI to work, it is necessary to help the user adjust the model as needed.
Mental Model Adjustment
When the system asks questions that involve only simple answers, for example, yes / no, the user can conclude that it is rather primitive and all next commands and answers will be formulated accordingly.
If the system asks questions, the answers to which the users can formulate as they like, and understands them, then the users will build all next interactions with the system at same level.
Humanity
To make interaction with VUI natural, you need to understand why communication with other people seems natural. But the problem is that we do not know that. Why does the conversation with some people seem more natural to us than with others? Due to what characteristics? Without knowing it, it is impossible to add it to the system.
A possible way out is to make a system which, when receiving feedback, will itself find out what has been done correctly and what could have been done differently. The system will figure out which characteristics are important for natural interaction.
Personality
Modern implementations of VUI allow to imitate the character of his personality - friendliness, sense of humor, intellectuality and others. These are quite diverse characteristics and the approach of different companies to their implementation is different.
Siri is a company project whose ideology is that everything should just work. And it really works if the user guesses with grammar and vocabulary. If it does not guess, the system, without any indication of what went wrong and how to correct the behavior, simply stops working.
At the same time great emphasis is placed on personality. Voice quality, jokes, funny comments when performing common tasks are sometimes really impressive. It creates feeling that in front of you real person. User become relaxed and tries to interact with Siri as a person. But when the system starts to react differently than user expects, the perception decreases sharply. He thinks that his actions are not approved or they are just laughing at him. And it is much worse than if user from very beginning see Siri as a system.
Google considered it safer not to try to imitate personality, to show that in front of user is simply a high-tech software product that does not even have a name (OK, Google).
The future of voice interfaces
In the nearest future, voice interaction will become more common in almost all areas of activity. Devices capable of recognizing voice and generating it are rapidly becoming cheaper with the development of voice assistants and the ubiquitous spread of the Internet. However, most often these will be highly specialized use cases. When the user understands, for example, that it is not necessary to ask a weather forecast from an automated kiosk selling ice cream.
Attempts to imitate the ability of voice assistants to answer any question or to perform any action that we can already perform using the visual interface won’t stop. But this is unlikely to work exactly as we imagine. In dialogue and with ordinary people, we often encounter misunderstandings, so what to say about machines. The problem of creating “real” artificial intelligence, which would completely solve all the problems of voice interaction, is connected with this — we simply do not fully understand how the brain and the human work.