登录查看更多内容

Vicious cycle of Bias in AI systems

Adarsh Srivastava

Head of Data & Analytics Quality Assurance | AI Ethics Lead @ Roche Dia | Sci. Advisor, Trustworthy AI | Data & AI Policy | Keynote speaker

发布日期: 2023年6月8日

Let’s start with the bias first, and the question to be asked will be, what exactly is bias??

As per the oxford dictionary, it is defined as “a strong feeling in favor of or against one group of people, or one side in an argument, often not based on fair judgment”

The important point to note here is that the bias is inherently connected to humans. However, with the fast-evolving technological paradigm, it is now interchangeably used for machines as well.?

(It will take another article to explain this conundrum of bias itself biased towards humans)

The big question then is, how can machines (or algorithms) be biased, it is such a human thing to do.?

The answer lies very subtly in the question itself and that is what gave birth to this article.

Before we move any further in this article, we have to agree on one important aspect. Human beings are naturally biased (either consciously or subconsciously) towards many things.?

A lot of times we cannot explain or identify the bias but we certainly cannot deny this fact.?

Few examples:

“I like strawberry icecreams and hate vanilla.”

“I think platinum rings look more beautiful than gold.”

“I prefer coffee in the morning and tea in the evening.”

You got the idea, I am not talking about the obvious biases here like race and gender but the more subtle ones. These kinds of biases work as an agent of a butterfly effect (looks small but is capable of creating significant impact) amplifying the bias. This is what we like to call “Real world bias”. At a granular level, it is practically unidentifiable and impossible to mitigate. You can read more about real-world bias in this great article by Rupa Singh.?

This real-world bias propagates in the dataset that gets accumulated over time and converted into data bias. Data bias is a reflection of real-world bias. It means that the data that is available is biased in some way or another. Note that the bias is still hidden under the layer of billions of data points. Until or unless we don’t put any extra effort to identify the bias in the dataset, it remains obscure from the human eye.

But unlike humans, machines are sufficiently capable of processing huge amounts of data. This is what machine learning algorithms and models do. They are designed to serve a specific purpose or solve business problems. To do so, these algorithms and models need data. It processes the data to find patterns in them and learn how to predict the outcome. This is where the bias in the data is exposed by Algorithms that eventually start influencing the outcome of the AI systems. (I will provide few real-world examples at the end in the further reading.)

Now, let’s take an example to understand how algorithmic bias fuels business bias. Consider an AI system that is designed to select candidates for army services based on their height. The underlying algorithm is trained on data that is biased toward male candidates in the Netherlands making the eligible height range 1.75 m to 1.85 m. However, the reality is different, the average height of males in the Netherlands is 1.84 m and females is 1.70 m. (As per world data info). When used, the output of this AI system will be completely biased and select only male candidates and women candidates will be rejected.?

If left unchecked, this AI system will go on and select only male candidates. The output of this system will also go as input and reinforce the same bias. This is a business bias at its worst.?

It may look outrageous and we probably file a petition to discard the AI system, but do you think the algorithm is the real culprit here? No. It is the data on which it is trained. The data that is collected in the real world. Data that majorly contains male candidates. Data that fueled this bias into the algorithm.?

(Note that I oversimplified the example for understanding purposes. Generally, AI systems consider millions of data points.)?

领英推荐

Curious AI 50

Oliver Rochford 4 个月前

The next paradigm shift in AI

Azeem Azhar 4 年前

Five AI Trends to Watch in 2019

Susan Etlinger 6 年前

This is where the viscous cycle connects back to the source. Due to the bias in the decision system, female candidates are completely omitted from the selection process. This pattern impacts real-world bias, creating the illusion that female candidates are not fit for army service. With the obvious nature of human beings, we subconsciously created this bias which then reflected in machines and it stopped considering female candidates for any army service?

And the cycle repeats itself where snake keeps eating its tail.?

One would ask, enough with the problem, what’s the solution? How to break this never-ending cycle and break free the system once and for all. No more bias in the world.?

Before moving to possible solutions, let me clarify one thing. It is impossible to completely remove a bias in any system without removing humans from the face of Earth. We can reduce bias, identify it, and plan to mitigate it but it is a long journey.?

The solution is two-fold, identify the bias as early as possible and plan to reduce it.

It can be done effectively around algorithms, precisely from input and output of it. The first challenge is to identify the data bias before it is fed to the algorithms as training or test datasets. There are a plethora of tools and methods available that can be used to take up this task. However, these tools are generalized, you will need someone who understands the data domain and can write rules around which the bias can be identified. In our example, you want a data quality specialist who can create rules around possible bias parameters (for example, gender, age, and geographical location).?

Let’s say you identified a show-stopper bias in your dataset. What now? There are various methods to address it. Like, resampling the data, generating synthetic data, etc.?

These methods are easier said than done. Identifying a data bias and fixing it requires time, energy, and money. There’s another way, slow and steady but equally effective.?

This is the 2nd part of the two-fold problem, create guard rails on the output of the algorithm. It means that we should consciously accept (we identified it) that there is a bias in the training/test data and then create business guardrails to prevent the potential harm done by the algorithm output. In our example, we make sure that our AI systems come with a warning that it is trained and tested on male-dominated data from the Netherlands and suitable only for male candidates in Netherlands or similar geographical areas. It should not be used for female candidates at all and other legacy methods should be used for the selection of the female candidates.?

This can do wonders if done properly, I will tell you why. With the efficient guardrails in place, we will start injecting data into the real world that is less biased (more female candidates). This will eventually flow into our AI system and voila! Our algorithm will start doing course corrections. It will take time and require constant monitoring and nudging but it is effective.?

In conclusion, I want to emphasize only one key aspect to solve this problem - Empathy. If we are conscious, determined, and actively work towards removing bias then we will be able to do it eventually. All it takes is acceptance and deliverance.?

Adarsh Srivastava的更多文章

Hallucinations in Generative AI - A feature or a bug?

2025年2月20日

Hallucinations in Generative AI - A feature or a bug?

With an Ethical consideration The rapid advancement of generative AI has brought forth remarkable capabilities, from…

5 条评论
The War Room and The Gaming Arena: How Conflict and Play Drive Technological Innovation

2025年1月30日

The War Room and The Gaming Arena: How Conflict and Play Drive Technological Innovation

Recently, I had an exhilarating experience playing a VR game with my son. As we donned our headsets and entered the…

2 条评论
Lego, Life, and Leadership

2024年10月14日

Lego, Life, and Leadership

Recently, while playing Lego with my son, I found myself reflecting deeply on life and leadership. It was one of those…

1 条评论
Data Quality Engineering: The Cornerstone of AI's Remarkable Advancement

2023年9月4日

Data Quality Engineering: The Cornerstone of AI's Remarkable Advancement

In today's data-driven world, artificial intelligence (AI) solutions have emerged as transformative tools that are…

2 条评论
Why do you need Data Ethics for Effective Implementation of AI Ethics in Healthcare

2023年8月14日

Why do you need Data Ethics for Effective Implementation of AI Ethics in Healthcare

Introduction As artificial intelligence (AI) continues to revolutionize the healthcare industry, the importance of…

1 条评论
Data Analytics and its root in ancient India

2022年5月31日

Data Analytics and its root in ancient India

Disclaimer: This story is passed down in a family for generations as an heirloom. My grandmother used it as a…

14 条评论
The Gender problem- Why Data Quality is important

2021年12月7日

The Gender problem- Why Data Quality is important

I have been playing with data since 1999 when I was first introduced to the cool CLI-based database management system…

5 条评论
Improving Data Quality using Feedback loops

2021年11月21日

Improving Data Quality using Feedback loops

Data Quality conundrum: In any data-driven application, particularly data analytics products, data quality is one of…

1 条评论

See all articles

Vicious cycle of Bias in AI systems

Adarsh Srivastava

Head of Data & Analytics Quality Assurance | AI Ethics Lead @ Roche Dia | Sci. Advisor, Trustworthy AI | Data & AI Policy | Keynote speaker

领英推荐

Adarsh Srivastava的更多文章

社区洞察

其他会员也浏览了

Five Things: AI's Political Biases, Tech Warfare. Ultracheap Stuff, Beirut, Heavy Metal

Solving the AI Data Shortage Before It Is a Crisis

The Necessary (and often Missing) “U” in the DIKUW Pyramid

Harnessing Deep Research: Unlocking Transformative Business Insights

A quick guide on Artificial Intelligence for data designers and curious minds.

Titans: A New Era for AI Memory Systems

Why the AI Revolution Is Really a Data Revolution

AI vs Reality: Did you Fall for the Hype

Predicting Human Behavior Through AI: The Unsettling Power of Social Media and Government Surveillance

When HAL 9000 Meets Your Hunches: Balancing Intuition and Analytics in the GenAI Era

领英推荐

Adarsh Srivastava的更多文章

Hallucinations in Generative AI - A feature or a bug?

The War Room and The Gaming Arena: How Conflict and Play Drive Technological Innovation

Lego, Life, and Leadership

Data Quality Engineering: The Cornerstone of AI's Remarkable Advancement

Why do you need Data Ethics for Effective Implementation of AI Ethics in Healthcare

Data Analytics and its root in ancient India

The Gender problem- Why Data Quality is important

Improving Data Quality using Feedback loops

社区洞察

其他会员也浏览了

Five Things: AI's Political Biases, Tech Warfare. Ultracheap Stuff, Beirut, Heavy Metal

Solving the AI Data Shortage Before It Is a Crisis

The Necessary (and often Missing) “U” in the DIKUW Pyramid

Harnessing Deep Research: Unlocking Transformative Business Insights

A quick guide on Artificial Intelligence for data designers and curious minds.

Titans: A New Era for AI Memory Systems

Why the AI Revolution Is Really a Data Revolution

AI vs Reality: Did you Fall for the Hype

Predicting Human Behavior Through AI: The Unsettling Power of Social Media and Government Surveillance

When HAL 9000 Meets Your Hunches: Balancing Intuition and Analytics in the GenAI Era