The Articulation Barrier: Prompt-Driven AI UX Hurts Usability
Permalink for this article: https://www.uxtigers.com/post/ai-articulation-barrier
Current generative AI systems like ChatGPT employ user interfaces driven by “prompts” entered by users in prose format. This intent-based outcome specification has excellent benefits, allowing skilled users to arrive at the desired outcome much faster than if they had to manually control the computer through a myriad of tedious commands, as was required by the traditional command-based UI paradigm, which ruled ever since we abandoned batch processing.
But one major usability downside is that users must be highly articulate to write the required prose text for the prompts. According to the latest literacy research (detailed below), half of the population in rich countries like the United States and Germany are classified as low-literacy users. (While the situation is better in Japan and possibly some other Asian countries, it’s much worse in mid-income countries and probably terrible in developing countries.)
I have been unable to find large-scale international studies of writing skills, so I rely on studies of reading skills. Generally, writing new descriptive prose is more challenging than reading and understanding prose already written by somebody else. Thus, I suspect that the proportion of low-articulation users (to coin a new and unstudied concept) is even higher than that of low-literacy users.
A small piece of empirical evidence for my thesis is the prevalence of so-called “prompt engineers” who specialize in writing the necessary text to make an AI cough up the desired outcome. That “prompt engineering” can be a job suggests that many business professionals can’t articulate their needs sufficiently well to use current AI user interfaces successfully for anything beyond the most straightforward problems.
Articulating your needs in writing is difficult, even at high literacy levels. For example, consider the head of a department in a big company who wants to automate some tedious procedures. He or she goes to the IT department and says, “I want such and such, and here are the specs.” What’s the chance that IT delivers software that actually does what this department needs? Close to nil, according to decades of experience with enterprise software development. Humans simply can’t state their needs in a specification document with any degree of accuracy. Same for prompts.
Why is this serious usability problem yet to be discussed, considering the oceans of analysis spilled during the current AI gold rush? Probably because most analyses of the new AI capabilities are written by people who are either academics or journalists. Two professions that require — guess what — high literacy. Our old insight you ≠ user isn’t widely appreciated in those lofty — not to say arrogant — circles.
I expect that in countries like the United States, Northern Europe, and East Asia, less than 20% of the population is sufficiently articulate in written prose to make advanced use of prompt-driven generative AI systems. 10% is actually my maximum-likelihood estimate as long as we lack more precise data on this problem.
For sure, half the population is insufficiently articulate in writing to use ChatGPT well.
Overcoming the Articulation Barrier
How to improve AI usability? We first need detailed qualitative studies of a broad range of users with different literacy skills using ChatGPT and other AI tools to perform real business tasks. Insights from such studies will give us a better understanding of the issue than my early, coarse analysis presented here.
Second, we need to build a broader range of user interface designs informed by this user research. Of course, these designs will then require even more research. Sorry for the trite conclusion that “more research is needed,” but we are at the stage where almost all work on the new AI tools has been driven purely by technologists and not by user experience professionals.
I don’t know the solution, but that won’t stop me from speculating. My best guess is that successful AI user interfaces will be hybrid and combine elements of intent-based outcome specification and some aspects of the graphical user interface from the previous command-driven paradigm. GUIs have superior usability because they show people what can be done rather than requiring them to articulate what they want.
While I don’t think it’s great, one current design that embodies some of these ideas is the AI feature of the Grammarly writing assistant (shown below). In addition to spelling out what they want, users can click buttons for a few common needs.
领英推荐
Adult Literacy Research Findings: Half the Populations Are Poor Readers
The best research into the reading skills of the broad population is conducted by the OECD, a club of mostly rich countries. The latest data I have found is from the Program for the International Assessment of Adult Competencies (PIAAC) and was collected from 2012 to 2017. The project tested a little less than a quarter of a million people, though the exact number is impossible to discern due to poor usability on the various websites reporting the findings. Anyway, it was a huge study.?As an aside, it’s a disgrace that it’s so hard to find and use this data we taxpayers have paid dearly to have collected and that the data is released with massive delays.
Children’s literacy is measured in a different set of research studies, the PIRLS (Progress in International Reading Literacy Study). I will not discuss this here since the skills of adult users determine the question of AI usability in business settings.
PIAAC measures literacy across each country’s entire population of people aged 16–65. For my goal of analyzing business usability, the lower end of this age range is problematic since most business professionals don’t start working until around age 22. However, PIAAC is the best we have, and most of their study participants qualify as working age.
The following chart shows the distribution of adult literacy in 20 countries, according to PIAAC. The red and orange segments represent people who can read (except for a few at the bottom of the red zone) but not very well. At level 1, readers can pick out individual pieces of information, like the telephone number to call in a job ad, but they can’t make inferences from the text. Low-level inferences become possible for readers at level 2, but the ability to construct meaning across larger chunks of text doesn’t exist for readers below level 3 (blue). Thus, level 3 is the first level to represent the ability to truly read and work with text. Levels 4 and 5 (bunched together as green) represent what we might call academic-level reading skills, requiring users to perform multi-step operations to integrate, interpret, or synthesize information from complex or lengthy continuous, noncontinuous, mixed, or multiple-type texts. These high-literacy users are also able to perform complex inferences and apply background knowledge to the interpretation of the text, something that the mainstream level 3 readers can’t do.
Level 5 (very high literacy) is not quite genius level, but almost there: in most rich countries, only 1% of the population has this level of ability to understand a complex text to the fullest.
We see from the chart that Japan is the only country in the study with good reading skills, and even they have a quarter of the population with low literacy. The Netherlands, New Zealand, and Scandinavia also have good scores. But most rich countries have more or less half the population in the lower-literacy range, where I suspect that people’s ability to articulate complex ideas in writing will also be low.
Since this research was run by the OECD (a club of mostly rich countries), there’s no data from poor countries. But the data from three middle-income countries (Chile, Mexico, and Türkiye) is terrible: in all three, the low-literacy part of the population accounts for more than 85%. I can only speculate, but the scores from impoverished developing countries with deficient school systems would probably be even worse.
This article is part of a more extensive series I’m writing about the user experience of modern AI tools.?Suggested reading order:
About the Author
Jakob Nielsen, Ph.D., is a usability pioneer with?40 years experience in UX. He founded the discount usability movement for fast and cheap iterative design, including heuristic evaluation and the?10 usability heuristics. He formulated the eponymous?Jakob’s Law of the Internet User Experience. Named “the king of usability” by?Internet Magazine, “the guru of Web page usability" by?The New York Times, and “the next best thing to a true time machine” by?USA Today. Prior to starting NN/g, Dr. Nielsen was a Sun Microsystems Distinguished Engineer and a Member of Research Staff at Bell Communications Research, the branch of Bell Labs owned by the Regional Bell Operating Companies. He is the author of 8 books, including?Designing Web Usability: The Practice of Simplicity, Usability Engineering, and?Multimedia and Hypertext: The Internet and Beyond. Dr. Nielsen holds 79 United States patents, mainly on making the Internet easier to use. He received the Lifetime Achievement Award for Human–Computer Interaction Practice from ACM SIGCHI. Follow Jakob on LinkedIn to see future articles.
Jakob Nielsen can speak at your company or event. Please see his speaker bureau, BrightSight Speakers, or email [email protected]
Presentation topics:
--
10 个月I have been thinking about the future technology of computer interfaces for years. The usability solution for integrating usability with virtual reality are multimodality and optimized virtual sensorimotor contingencies. Unimodal interactions such as only using text to express ourselves is very unnatural. Multimodality does not just mean multimedia, it includes any senses of how we interact with the real world because we are naturally trained best to interact with or within reality. I think of a virtual reality operating system whose concepts supersede concepts based on implementation details such as memory/storage, programs, files or browsers/internet, instead being based on objects and their representations, avatar-represented agents (programs) and rooms. Multimodality is a (multi)language which consists of verbal and non-verbal elements, including visual modalities (drawings, symbols), aural modalities (semantics, pragmatics, earcons), gestures, mimics (emotions), even spatial modalities such as proxemics, proprioception, turn taking etc. In my opinion, a UI is particularly effective or usable if it provides a high flexibility in user input and in fusing modalities, as a generalization of Fitt's law to modalities.
Director, Research Enablement at i4cp
10 个月For me, this is a new insight pertinent to our discussions this week. Janet Clardy Jenelle Buatti Michele Powell-Cromwell (she, her, hers) Shelly Arnett Kelly Picone Katheryn Brekken, Ph.D. Amy Schultz Ashley Priskey Karmen Simon
Senior Portfolio Risk Officer @ NSW Gov | Risk Management, Data Architecture
1 年I'm interested/worried to see the effect of AI outputs that have been fed mostly US-based data sources with information coded in ways that are almost uniquely American. For instance date formats in mm/dd/yyyy that even now are constantly misinterpreted by major platforms and rarely translated correctly for the rest of the world. Examples include Amazon stores showing product release times where the string 09/01/23 has been picked up from the US site and turned into 9 Jan 2023 elsewhere, when it actually should be 01/09/2023. Google reads my emails and creates tasks and labels in other Google products that regularly swap day and month. I reported these bugs for years without getting any traction, because ... it's outside US.... Similarly place names that are replicated around the world are often conflated by US systems, so if you live in Newtown Australia, you may get news, weather and transport information about a Newtown in the US. (Facebook regularly combined places in Australia and England, treating them as the same places.) So now take all these international sufficiency failures in historical command UI and see how they get amplified by ambiguously phrased prompts to systems that assume the world is based on Template USA
TCS Research and Innovation
1 年cc Rajeev M A
Specjalistka ds. komunikacji marki i produktu/ Liderka zespo?u
1 年Katarzyna Niedzielska