登录查看更多内容

Achieving Successful AI Development: Mitigating Bias

Alejandro Betancourt, Ph.D.

Artificial Intelligence, tech and data

发布日期: 2023年2月13日

Cognitive biases play a crucial role in AI projects and can considerably affect their outcome. These are a result of human nature, arising from past experiences, heuristics, assumptions, personal emotions, among other factors. In data-related projects, these biases can lead to incorrect hypotheses, misinterpretations of data, and poor decision-making.

Despite their impact, cognitive biases are often overlooked or underestimated in data projects. However, it is possible to mitigate their influence by raising awareness and taking proactive steps. Understanding cognitive biases and taking action to address them can lead to better outcomes and a higher chance of success in AI projects.

This article summarizes some of the most common biases that occur during AI development, briefly explores their impacts, and proposes strategies to mitigate them. The biases are grouped based on the stage of the data project that is most affected. The proposed grouping and selected biases are based on personal experience leading data teams and facing stakeholders on different projects and industries.

1. Problem definition and requirements gathering

This stage is crucial for AI development and sets the foundations for successful projects and systems. In this stage data scientists, ML-Engineers, business units, and stakeholders come together to understand the business problem and gather the necessary requirements to develop an AI solution. The objective of this stage is to identify the objectives, expected outcomes, and limitations of the AI system, as well as determine the data needed and its availability to develop the project.

Information bias: This bias occurs when individuals seek out more information than is necessary. Information bias can easily derail a project from the start, so it is important to have seasoned leadership to prevent it. These are some steps I find useful to mitigate information bias: i) define a set of metrics to be measured during development, ii) collect a set of features that can easily be obtained from existing data, iii) train the first version of the model using the metrics to guide model and hyperparameter selection iv) Always balance the effort required to collect new variables against the potential benefits to the metrics obtained on previous steps.

Hindsight bias: This cognitive bias occurs when individuals believe that past events were more predictable than they actually were. While defining a data problem hindsight bias could kick in making recent events seem obvious and easy to predict. For example, when addressing a recent fraud case, business owners might quickly identify red flags and conclude that a simple system would have alerted it on the fly. However, these hypotheses may not generalize to the entire dataset and could end up performing poorly. This bias could be mitigated by avoiding jumping to conclusions with few samples and validating the proposed rules using all available data, not just recent cases.

Loss aversion: This bias occurs when individuals have a strong preference for avoiding losses over acquiring gains. This can result in a conservative perspective on implementing new initiatives, even if they could bring significant benefits to the company. To mitigate loss aversion, controlled strategies such as canary deployments or A/B testing can be used. It is important to remember that not every problem requires controlled scaling, but it provides a safe way to capture the benefits of the proposed system and mitigate potential losses.

2. Data preparation and model training

Once the business question and system requirements are defined, the data team starts collecting, cleaning, and processing data for modeling. The goal is to build a consistent dataset, or data pipeline, to create a model that predicts the business question according to the requirements. There are several biases that can impact this stage.

Clustering illusion and illusory correlation: Humans have a tendency, and capability, to find patterns, even when they don't exist. Clustering Illusion and Illusory Correlation lead data scientists to believe they've discovered patterns that aren't actually there. In practice, they may even construct convincing stories about these illusory patterns, some of which could resonate with stakeholders. To mitigate these biases, it's essential to test as much as possible, avoid making conclusions based on a small sample size, use alternative data visualizations, and utilize metrics to iterate models.

Survivorship bias: This bias occurs when people focus on success stories because are the ones available while ignoring failures because cannot be collected. Survivorship bias can lead data scientists to only consider data that has already been filtered by a previous step. To mitigate this bias, it's helpful to initially have the system work behind the scenes as an advisory agent for the expert team. This way, the team can gradually shift their decisions based on the model's recommendations, leading to a broader and more diverse knowledge base to boost the model.

3. Model Validation

Once the model is ready, it's crucial to validate its performance against the requirements. Stakeholders provide feedback about the model's performance, and if necessary, data scientists may iterate before deployment.

Rajoo Jha 1 年前

Why AI (and GenAI) Projects Fail and How to Avoid it

Lingaro 1 个月前

How AI and Machine Learning Can Boost Business…

John Rampton 1 年前

Confirmation and congruence bias: These biases occur when individuals favor information that confirms their beliefs or hypotheses. In some cases, even with great results, a project may be disregarded because it does not confirm the stakeholders' beliefs. Although it's challenging to completely eliminate this bias, a step in the right direction is to raise awareness among stakeholders and provide as much evidence as possible when presenting results and making decisions about the project.

Semmelweis reflex: This bias occurs when individuals reject new information or ideas that contradict their current beliefs or practices. This behavior can prevent the company from adopting new techniques, technologies, or strategies that contradict the company's current beliefs and processes. For example, if a new system provides answers that differ from the business unit's recommendations, the Semmelweis Reflex may kick in. To counter this bias, it's crucial to emphasize that the AI system and human expertise are complementary and that a well-designed system leverages this collaboration. Comparing human-level performance to the system's performance can also help maintain objective judgment of the system.

Subjective validation: This bias occurs when individuals believe that statements or results are true simply because they have personal significance to them. subjective validation can lead stakeholders to believe a model is properly working because it relates to their experience or beliefs. To avoid this bias, it's important to be realistic about the model's capabilities and the specific questions it was designed to solve. A good way to achieve this is to make sure every person involved in the project is aligned with the model objective and restrictions from the beginning. Invest time and energy in explaining the business question and model objective.

Framing effect: This bias occurs when decision-makers decisions are influenced by the manner in which options are presented. By tapping into this bias, data scientists may frame the results to push the stakeholders' decision toward a particular choice. To mitigate the framing effect try rephrasing sentences in positive and negative ways, providing different representations of the same result, or analyzing decisions using different measurement units (e.g. percentages and absolute values). It is important to be careful while trying to mitigate this bias because you could accidentally endup making the decision-making process more complex or creating unintended blockers in the projects.

4. Deploying, Monitoring, and scaling the system

Finally, the model is live, but the challenges of cognitive bias don't go away. At this stage, data scientists and stakeholders must constantly monitor the system metrics to ensure it continues to deliver the desired outcomes and make necessary adjustments if any.

Overconfidence effect: It's natural for us to overestimate our ability to perform a task. When a system is already deployed we may wrongly believe that the performance won't degrade over time, that it will easily handle unseen observations, and it will keep performing even if the data start drifting. To mitigate this, it's important to constantly monitor the performance of the system with fresh data and be ready to deploy newer versions of the model when required. A good MLOps practice will be critical to achieving this at scale.

Base rate fallacy: This bias is characterized by our tendency to place too much importance on specific event characteristics rather than the base patterns of the system. This becomes noticeable when performing error analysis, where teams may spend a lot of time and energy investigating specific cases, even if the overall system is working as expected. To mitigate the best rate fallacy make sure that people analyzing the system results are aware of the relevance to the system performance of the relevance of the cases they are evaluating. The worst that could happen is to endup holding a project due to minor error types or adding unnecessary complexity to the system to obtain just a marginal performance gain. Raise awareness among decision-makers and users about this bias, work to keep the system metrics available, and quantify the impact of specific cases.

Neglect Probability: AI systems typically output probabilities. However, decisions to be made are binary or discrete. This makes it common for decision-makers to disregard the underlying probability once a decision is made. The same happens when making decisions on daily activities, for some decisions your confidence could be extremely high, while for others your confidence could drop but still decision must be made. The same happens to models, for some samples it could output a high probability while for others it could be lower but still help you make better-informed decisions in the long term. Deployed models are not perfect and will make errors or deviate from optimal answers to some degree. This is acceptable as long as the overall performance of the system creates value or solves scalability issues for your business. Remember to keep an eye on high-level metrics and embrace that no model is perfect, but some will bring immense value.

Conclusion

Cognitive biases in AI development can have significant consequences if ignored and could lead to considerable wasted resources, frustration between teams, or even set projects for failure. Biases are tricky to detect and sometimes is not even possible to eliminate them. However, with proper planning, awareness, and specific strategies is possible to control their negative impact and maximize the chances of success of data projects in your organization.

Without being exhaustive in the biased coverage and oversimplifying AI development stages I would feel happy if this article helps you and your organization to increase awareness and boost the chances of bringing your projects to live! If you find this article interesting and would like to learn more about biases, in general, I would recommend the following books, and if you have additional material that would complement this article I would love to read it.

Andres Abumohor

Co-Founder at KLYM

1 年

Que orgullo contar contigo en el equipo KLYM

1 次回应

Nathan Zenero

Windmills are Driving Whales Crazy

1 年

Bias is the biggest, most toxic trait of AI. Thanks for sharing this.

1 次回应

查看更多评论

要查看或添加评论，请登录

Alejandro Betancourt, Ph.D.的更多文章

Part 2: Closing the Gap between AI Models and Business Impact

2024年8月20日

Part 2: Closing the Gap between AI Models and Business Impact

Automated Counterparty Management for Credit Underwriting This second part of the LinkedIn series explores bridging the…

6 条评论
Part 1: Closing the Gap between AI Models and Business Impact

2024年7月29日

Part 1: Closing the Gap between AI Models and Business Impact

Shifting from model-centric to process-centric. When effectively applied, Artificial Intelligence (AI) models can…

19 条评论
MLOps Essentials: Doing Machine Learning Operations right with design patterns

2023年8月30日

MLOps Essentials: Doing Machine Learning Operations right with design patterns

When designing Machine Learning Operations (MLOps) processes, coding is more than just writing functional scripts or…

4 条评论
Navigating the AI Talent Landscape in Banking: A Commentary on 'The Race for AI Banking Talent' Report

2023年7月10日

Navigating the AI Talent Landscape in Banking: A Commentary on 'The Race for AI Banking Talent' Report

In today's banking world, having top-tier Artificial Intelligence (AI) talent is essential for financial institutions…

7 条评论
Who is driving your AI journey?

2022年4月5日

Who is driving your AI journey?

Artificial Intelligence (AI) opens a whole world of opportunities in every economic sector. However, those will be…

1 条评论
Scaling AI with MLOps

2021年8月17日

Scaling AI with MLOps

Let’s chat a bit about my last poll regarding Machine Learning Operations (MLOps) and its most relevant use-cases. The…

2 条评论
Reinventing face-to-face leadership

2020年3月31日

Reinventing face-to-face leadership

Back in 2016 in Ghent (Belgium) when I joined TPVision (Philips TV) it was great to see that each morning every person…

2 条评论
Company wide changes supporting the digital strategy: Part 1, C-Level impacts.

2018年10月18日

Company wide changes supporting the digital strategy: Part 1, C-Level impacts.

Introduction My previous post covered some non-technical challenges behind Artificial Intelligence development, and…

2 条评论
Artificial Intelligence: Exciting times and opportunities for social innovation

2018年9月9日

Artificial Intelligence: Exciting times and opportunities for social innovation

During the last years, I have been part of a fascinating research and work area including Computer Vision, Machine…
Un Ejemplo del Retorno de la Inversión en Investigación

2018年4月22日

Un Ejemplo del Retorno de la Inversión en Investigación

Un reportaje escrito a finales de 2017 por EAFIT sobre mi retorno a Colombia y el reto de ser parte de la…

10 条评论

See all articles

Achieving Successful AI Development: Mitigating Bias

Alejandro Betancourt, Ph.D.

Artificial Intelligence, tech and data

1. Problem definition and requirements gathering

2. Data preparation and model training

3. Model Validation

领英推荐

4. Deploying, Monitoring, and scaling the system

Conclusion

Alejandro Betancourt, Ph.D.的更多文章

社区洞察

其他会员也浏览了

Asia Leading in AI Business Deployment, Personalized Prediction to Combat COVID-19

The Transformative Impact of Advanced AI on the Talent Marketplace

The Arrival of Predictive AI: reshaping behavioral science and organizational dynamics

THE PROS AND CONS OF ARTIFICIAL INTELLIGENCE

Critical Thinking in the Age of AI: A Guide for Evaluating AI-Generated Information

AI Villain - Bias. Navigating the Complexities of AI Bias: Global Challenges and New Zealand's Unique Perspective

Understanding AI Systems

Uncovering Bias in AI: Strategies for Building Fair and Inclusive Models

Navigating the AI Frontier: A Watch Out List for Businesses

The Goldilocks Conundrum: Finding the ‘Just Right’ in AI Validation

1. Problem definition and requirements gathering

2. Data preparation and model training

3. Model Validation

领英推荐

4. Deploying, Monitoring, and scaling the system

Conclusion

Alejandro Betancourt, Ph.D.的更多文章

Part 2: Closing the Gap between AI Models and Business Impact

Part 1: Closing the Gap between AI Models and Business Impact

MLOps Essentials: Doing Machine Learning Operations right with design patterns

Navigating the AI Talent Landscape in Banking: A Commentary on 'The Race for AI Banking Talent' Report

Who is driving your AI journey?

Scaling AI with MLOps

Reinventing face-to-face leadership

Company wide changes supporting the digital strategy: Part 1, C-Level impacts.

Artificial Intelligence: Exciting times and opportunities for social innovation

Un Ejemplo del Retorno de la Inversión en Investigación

社区洞察

其他会员也浏览了

Asia Leading in AI Business Deployment, Personalized Prediction to Combat COVID-19

The Transformative Impact of Advanced AI on the Talent Marketplace

The Arrival of Predictive AI: reshaping behavioral science and organizational dynamics

THE PROS AND CONS OF ARTIFICIAL INTELLIGENCE

Critical Thinking in the Age of AI: A Guide for Evaluating AI-Generated Information

AI Villain - Bias. Navigating the Complexities of AI Bias: Global Challenges and New Zealand's Unique Perspective

Understanding AI Systems

Uncovering Bias in AI: Strategies for Building Fair and Inclusive Models

Navigating the AI Frontier: A Watch Out List for Businesses

The Goldilocks Conundrum: Finding the ‘Just Right’ in AI Validation