An Anecdote on the Idolatry of AI
Do you think you can out-perform AI at a given task? Your answer probably hinges upon a variety of factors. Further, your knowledge of Artificial Intelligence and/or Machine Learning (AI/ML) likely drives the factors you would consider in such a question. We recently attempted to collect some pilot data on human-AI interaction in a forecasting task, but it ended up raising more questions than it answered.
An Anecdote About Perceived Self-Efficacy vs AI-Efficacy
Background
Each year our company has a "Pizza and a Demo" session where partners can show off R&D progress, including posters of findings and/or working software prototypes. Because we study decision making and human-machine interaction, the HAIL team often uses this event to collect anecdotal data on a small scale to drive new algorithms and collect feedback on interfaces and designs.
One of our many projects we showcased this last year (in early September) was a very fledgling prototype of our ALL THE ABOVE concept, which aims to fuse space and terrestrial weather and apply AI/ML to predict the effectiveness of satellite payloads. We had only recently starting piecing together different components (algorithms, visualizations, business logic, etc.), and more importantly, had begun collecting data from one of our C-band receivers only three months prior.
In addition to using ALL THE ABOVE as a plug-and-play testbed for a variety of data fusion, AI/ML, and data visualization concepts, we also aim to research human-AI interaction with a forecasting task under high uncertainty. That is, we want to look at different human reasoning processes as they interact with the AI, then ultimately make a prediction on what the expected payload performance would be at a specified time 36-48 hours in advance (based on weather forecasts). Instead of asking people to use a tool and fill out a survey, we tried to make it fun and engaging, where they submitted a written forecast of the Carrier to Noise Ratio (CNR), and if they were closer than the AI, they would win a gourmet doughnut of their choice, that I would personally deliver to their desk. We even had a huge banner reading "Beat the AI, Win a Doughnut!" We were looking forward to recreating a John Henry-esque clash of man vs machine, where we could study the relationships among forecasting accuracy, reasoning processes, and confidence.
The Cult of AI?
That did not happen. While I was hoping to provide an overview of our results from the pilot, all I can say is that half of our sample (N = 4) got closer than the AI (a simple ARIMA model, which one could debate is not even AI). You read the correctly: Of the 100+ people at the 90 minute session, only four people actually submitted a forecast against the AI to try and win a doughnut. We even had two computers running to increase the throughput of participants. The interesting point here is not the results, but rather the lack of results due to a possible reverence of AI.
When the session first started we saw plenty of people point and smile at the big "win a doughnut" poster. Then after a few minutes a handful of people approached the table to scope it out, but none would sit at one of the two computers to interact with the system. We eventually began to actively ask people to sit down, and then we even walked them through some of the forecasts and charts, and showed them how they may use the tools provided to make a forecast. When we asked people to fill out our simple survey (something like 5-ish questions), the common refrain we got from people was "oh... i couldn't possibly..." or "there's no way I'll be able to do that..." (i.e. beat the AI).
We were able to get over 20+ people to interact with the system, although only four would formalize their experience by providing a forecast. In order to encourage participation, we unsuccessfully tried explaining a variety of things to reassure folks that they had a better chance than they thought (plus the fact that there was literally zero cost to try, yet a tangible reward if they succeeded). We started by trying to decrease the perception or mystique of the AI, "It's not some kind of fancy deep learning algorithm, it's just a moving window technique..." as we assumed people were possibly intimidated by recent advances in deep learning.
When that failed, we tried to highlight the lack of sufficient data quantity, "we only have less than three months of data." We even highlighted how data were only logged hourly, leaving us with just 2000 data points to try and predict the weather's effects on our receiver. We also highlighted the lack of data quality, "the algorithm was trained on data from the hot/humid summer, and not the fall. Plus, most of these variables have little-to-know information value." We explained how the dropping temperatures in September meant that the AI had never seen such a day before, and thereby would not reliably predict the performance.
By the end of the session, desperate for participation, we found ourselves flat-out berating our own pet project, "it's going to be terrible this time around... you're way more likely to win than you think... it's going to suck until it gets at least a year of data" Despite all of these appeals (algorithm strength, data quantity, and data quality), only four of the 20-ish people that interacted with the system were willing to take the symbolic leap of challenging the AI by placing their forecast on paper. There was some aura surrounding the AI, where no amount of explaining its limitations could seem to overcome it.
Normative Explanations
I have spent a while thinking about why this may have been the case, and what could have been done differently to make people feel as though they had a fighting chance against the machine, even though we knew the machine was under-gunned (a large point of this event was to baseline the AI, to show it getting better over time at future events).
- Framing: Kahneman and Tversky have shown that framing effects are important (especially regarding loss vs gain). I wonder if framing the event differently than a direct challenge against the AI (no matter how light-hearted of a challenge it was) as something more collaborative would have engendered more participation. A colleague even mentioned re-framing the technology as a "statistical model" or "prediction machine" instead of the hallowed "AI" moniker to make it less intimidating. What is interesting is that many people do, in fact, challenge AI on a daily basis by choosing to ignore/override driving apps on their cellphone (as one example), although it may not be framed that way. For example, you are betting against AI by ignoring the recommendation on your phone to take the George Washington Bridge and instead use the Tappan Zee Bridge (see Gial Ackbar above regarding that suggestion). Anecdotal evidence working with a variety of user groups has shown me that people are consistently willing to say they can outperform AI in more serious contexts (e.g. warfare), therefore, I am not entirely confident that this factor alone explains the lack of willingness to participate.
- Domain Mastery: A relatively obvious explanation is the idea of domain mastery. It may not be that people perceived the AI as being some all-powerful entity, but rather they thought that their mastery of weather and/or satellite communications (SATCOM) wasn't high enough to challenge anything or anyone at the task. This might explain why people quite often override AI suggestions for routes on driving apps or other consumer recommender systems, where they have significant experience and tacit knowledge of what routes work best in a given area (i.e. mastery). That being said, there are a considerable number of people at our company that have varying levels of expertise in weather and/or RF propagation, making them quite qualified in the domain. Therefore, this issue alone could not explain the aversion to challenging the AI.
- AI Fluency and the Appreciation of Information: Another thought was that all of the things we told people regarding the limitations of the AI may have been moot if they did not "speak AI." That is, if they do not understand what type of sample size is appropriate for a certain type of model (with a certain number of features), my explaining of the meek data quantity and the mediocre data quality would have done nothing to sway their opinion. Further, if somebody doesn't know the difference between ARIMA and a neural network, then the distinction is pointless. That being said, we work in a relatively AI- and data science-savvy company, where there is no shortage of Computer Science and Mathematics degrees walking around, so this issue alone could not explain it.
- Risk vs Reward: Although we exhaustively explained that A) the data were not going to be shared in public (eliminating a perceived social risk), B) that the time commitment to fill out the entry form was less than two minutes, and C) that there was no cost to participate (eliminating cost risks), there may have been some other risk that people perceived that outweighed the reward. Further, a gourmet doughnut, while appealing to some (this bakery is somewhat famous in the area, or at least in our hallways and lounge areas on campus), may have meant nothing to others. The more fun (but more expensive) idea was originally to take the winners out for a beer - I wonder what that would have done to participation. Therefore, there are a variety of things that may change an individual's calculus on the relative risks and rewards. To continue the mapping/route example, people will gladly risk 20-40 minutes of their time (and being late to non-trivial events) by going against AI recommendations for routes on their phones. Given that precedence, and the fact that there was no risk plus a non-zero reward for this task, I don't believe this was the sole issue.
To summarize, there are a variety of factors that could have affected the average person's willingness to challenge the AI. I do not believe any one of the factors ruled the day, but instead a confluence of them (and others I have not named) likely affected the decision to participate or not. At the end of the day I still found the whole experience of reverence for AI interesting, and I am further grateful for the four folks who were sympathetic to the cause and took a shot at it.
Possible Research Agenda
As previously stated, I think we have more questions about human-AI interaction (or at least human impressions of AI) now than we did before we attempted the demo. I would bet good money that there is plenty of extant research on these questions, but I am looking forward to continuing to dive into them as our current and future tasking allows:
- What factors affect somebody's initial or default perceived self-efficacy against AI? They way the AI is described/characterized? The interaction modality (competition vs team)?
- What information can be provided to better calibrate that perception? The type of AI? The degree to which it has been tested/vetted? The quantity and/or quality of data used to train it?
- Of the answers to the preceding questions, which are universal, and which are based on the context of the task at hand?
I think it's easy to say, "but Steve, these are stupid/obvious questions" (I actually feel hesitant to press the 'publish' button, to be honest), but the fact is that unless we can answer these questions in a relatively generalizable fashion we will continue to have people that put too much stock in AI, regardless of its capabilities and limitations given the various contexts previously discussed. The degree to which context affects AI performance further exacerbates the need to understand human perceptions of AI's limitations, and how to successfully communicate them to people of various backgrounds. After all, the two people that actually beat the AI were the ones that blindly guessed a number within the previous range (the two folks that provided a specific reasoning process failed to get closer than the AI). I ended up getting doughnuts for everyone.
Head Men's Track & Field and Cross Country Coach
5 年Great article
Lead Data Scientist at Gartner
5 年Wow very surprised about this, but impressed you were able to use it as a driving force for future work!