AI Matter of Trust
While introducing our agents, we frequently are faced with questions that go beyond the core of technology. Users are asking us:
How can I trust an intelligent agent? How do I know that I can rely on artificial intelligence?
We need to consider questions such as: What can we do to help users build that trust? How do we ourselves trust the technology we make? And what “trust” towards an artificially intelligent technology mean in the first place?
With our intelligent agents we enter a new ground of computing, because for many users, it is not common to deal with probability or chances when using computers at work. In many cases, our computing systems are supposed to be exact machines that are correct 100% of the time. While users have experience in understanding probability in weather predictions, advertisements, or speech recognition, most do not experience this in their professional work software. Let us underline that all the information that an agent gathers is actually what a normal person can find on the internet or intranet, respectively. We have trained our models in such a way that they can recognise a trustworthy source from a sceptical one, quite meticulously. Our agents and artificial intelligence in general, however, produce results with a certain chance of being right or wrong.
Precision and Recall:?As we detail, there is no one-size-fits-all answer to the trust question, as the level of trust people have in AI varies depending on the perspective or specific context. Our developers and other providers of machine learning technologies perform tasks such as pattern recognition, classification, and automatic valuations. They use statistical indicators such as “precision” and “recall” as performance and reliability metrics for the system. Precision is the probability of a positive prediction to be true - i.e. the results our agent suggests are actually good results. Recall is the probability of a positive target to be predicted positive - i.e. our agent identified all relevant results from the overall data. The higher these two metrics are, the more reliable and trustworthy the system is. This is how we ensure the quality of our agents during development.
Subjective Recall and Subjective Precision
Most of the time, the answer to the trust question looks different for our users. We come across the trust challenge when we provide the agents to our clients, for example, with the task of supplier search. Based on specific user requests, our Company Identification Agent performs fast and requirements-specific supplier searches using information from the public web, mimicking a human search. Out of thousands of company candidates, the agent filters a shortlist of companies that provide a feature set and are based in the specified geography. The question we face is: “How can I trust the agent to find me all the relevant companies?”.
After some more in-depth conversations, we realized that trust is highly impacted by the first encounters with the agent. If the agent shows the companies that the user already knew existed, trust is high, but if the agent missed these companies, trust is low. Users wonder: “If the agent even did not find the companies I know, how can the rest of the list be any good?”??This “recall” measure is, of course, highly subjective. It is not a recall against all companies that exist and the ratio of how many the agent identified; it is rather about the agent finding the companies that are on our users’ minds. This makes a lot of sense to them, because nobody can possibly know all the companies that exist, so the only reference data users have is subjective.?
We see another phenomenon when it comes to precision. When the agent shares its results, users judge false results, by the perceived level of (human) stupidity. “How stupid was the agent in showing the false result?”, meaning, “How easy is it for a human user to understand that this is not a correct result?” Given that the agent works very differently in how it considers data and connects it, there are indeed false cases which are very easy to detect for humans. For example, a supplier agent detects a record that is clearly not a relevant supplier. But these insights are difficult for the agents, often given their lack of contextual knowledge about the world. Unfortunately, one very “stupid” result of the agent, in a list of otherwise very good results, greatly impacts the level of trust towards the agent. Indeed, humans follows the logic: “If one result is that stupid, how can the agent be right about the other results?”
So in both cases, recall and precision become subjective measures that do not really allow to judge the level of trust that can be put into an agent. In consequence, following these false first impressions users might not trust an agent that works quite well. And, on the contrary, users may put trust in agents that do not necessarily perform well but were “lucky” to match the result the user expected.
Building Trust
At a recent conference, we were talking about this phenomenon and a participant suggested a solution. We all, including our users, should learn about statistics, test methods, etc. and base our trust on the neutral data. While this sounds good, it can be hard to implement, since we cannot all become statisticians and change our emotions. Even if we do, in many cases the test data itself needs to be thoroughly inspected to understand what the results mean for the particular use case that users may have in mind.
Therefore, we suggest a more pragmatic approach:
What's next in building trust towards the agents?
As you can see, many of the approaches above require us as human agent designers and engineers to support the users in building trust. This works but is a process that is hard to scale. Therefore, we are working on allowing the agents themselves to support the users on their way of understanding how to build experience and learn about how to professionally deal with trust toward intelligent technology.
Senior Project Manager | Product Owner | Helping companies run software projects (SAFe, Waterfall, Agile)
1 周Sebastian, awesome !