Go Forth and Click No More: How we can move beyond the awkwardness of chatbots and screen-based AI interactions
Christian Ulstrup
AI Implementation Expert | Fmr. MIT AI Co-Chair | Helping Leaders Execute 10x Faster | ex-Red Bull, Arterys (acq. by Tempus AI, NASDAQ:TEM), ARPA-H AI Advisor | Book a Strategic Planning Call
One of the big problems with current AI/chat applications is that although it would seem at first glance that this would be a preferable alternative to the traditional point-and-click component-based interface because of the additional personalization and flexibility afforded to users, the truth is it actually, more often than not, introduces additional friction. This is one reason why I think, in general, the turn-your-website-into-a-chatbot approach just does not make a whole lot of sense.
If you think about this from an ergonomics perspective, or I guess human-computer interaction more specifically, every key press, and most people still are typing, which is unfortunate because moving to dictation, especially since we have state-of-the-art models like Whisper, is just so much better once you've made the switch. But each key press to provide some kind of input probably is more cognitively taxing than a mouse click. So there's a relatively high cost in terms of cognitive load and energy from a user perspective when they're presented with a text box, let's say, versus a point-and-click interface.
The other issue is that it's open-ended, and most of the experiences I've seen require the user to initiate the conversation. And even with a few conversation starters as little components, it's just not a particularly good experience. It's like if you met somebody at a networking event, and you knew maybe a little bit about them, but they were just kind of standing in front of you, drink in hand, intensely staring at you silently, waiting for you to initiate the conversation. Like, nobody actually wants that kind of experience.
Instead, what we'd prefer is to have somebody who knows quite a bit about us, can meet us where we are, has an understanding of our preferences, and instead of requiring us to do the hard work of figuring out how to initiate the conversation under conditions of pretty extreme uncertainty, it would be best if they actually initiated the conversation and asked us questions that got us energized because they touched directly on exactly the topics that we're interested in.
Now, I've written about this before, but I think ultimately there are two pretty substantial personal computing shifts that need to happen for us to be able to go beyond these issues.
The first is there needs to be some kind of login with your digital memories experience whereby you share some portion of your private digital repository, which could be maybe your emails or your search history, which of course Google has and is part of the reason why they're a trillion-dollar company. Or in the extreme, some subset of everything you've seen, said, heard, done, and so on and so forth, which will be increasingly feasible with always-on ambient computing platforms (e.g., Humane 's new Pin) that are coming to market in the months and years ahead.
The second issue is that screen-based interfaces are, I think, never really going to be a perfect fit for this kind of discursive experience. And really for it to work, we've got to be able, willing, and desire a more radical switch to primarily voice interfaces.
Now, that's great when it comes to input because it's actually significantly less cognitively taxing to speak at length than it is to type. And I would argue probably even to point and click because you can be almost purely in a production mode and not distracted by whatever's on the screen in front of you.
领英推荐
So that's great. But on the flip side, when it comes to consumption of information, primarily through an audio interface, meaning text-to-speech or something spoken, the threshold in terms of overall intelligence and reliability of pertinent, relevant, useful, and interesting responses from the AI assistant counterparty is extremely high.
And although I'm pretty impressed by what's already possible with assistants like Inflection AI 's Pi that ask you questions by default, which I think is an absolutely brilliant user experience choice, and OpenAI 's nascent voice-only interface through the ChatGPT app, my suspicion is that getting to the point that people are en masse willing to make that switch to a voice-first medium when it comes to using networked personal computers is going to be quite, quite high.
And we may end up running into a series of technical problems that delay the adoption of this technology and prevent it from having to be integrated into our lives in a way that really unlocks unprecedented productivity, similar to how it's taken much longer for self-driving cars to reach the level of reliability that consumers and regulators expect.
Or, in the case of spatial computing or virtual reality technology, the user experience thresholds around things like latency, which has to be super, super short to prevent cybersickness, or the incredibly high resolution screens that we need, spatial audio requirements, device weight, etc.
So basically, what I'm saying is, I think it's difficult, but certainly not impossible, especially in highly specific and tailored use cases, to leverage AI in a way that really improves workflow experiences, especially for knowledge workers who are encumbered by clerical and administrative tasks, and primarily use a point-and-click keyboard and mouse-based, screen-based personal computing environment.
But I think for us to really take things to the next level, there has to be this more radical shift to a voice-first interface. And again, for the reasons that I described above, and also because on the consumption side, bandwidth is much higher when you're reading text, especially if you're hopping and skipping across documents, versus having to listen very intently to a voice assistant. The relative quality of what's compressed into every single word that you would listen to versus what you could skim over as you're browsing the web or your private document store is extremely, extremely high.
And these are all challenges that I think will be overcome, but it really, you know, if we're to learn from virtual reality or self-driving cars, it could be significantly longer to squeeze out those last few percentage points of reliability and quality before this revolutionary technology really starts to making a meaningful impact.
We shall see.
Sr Product Marketing Manager
1 年Is this a long-form answer to the question I asked you the other day? ??
AI Implementation Expert | Fmr. MIT AI Co-Chair | Helping Leaders Execute 10x Faster | ex-Red Bull, Arterys (acq. by Tempus AI, NASDAQ:TEM), ARPA-H AI Advisor | Book a Strategic Planning Call
1 年s/o to Neil C. and superwhisper, the best bridge from the old point-and-click-and-type to the new discursive approach to getting the most out of your personal computer! It's been a game-changer for me.