Vicious cycle of Bias in AI systems

Vicious cycle of Bias in AI systems

Let’s start with the bias first, and the question to be asked will be, what exactly is bias??

As per the oxford dictionary, it is defined as “a strong feeling in favor of or against one group of people, or one side in an argument, often not based on fair judgment

The important point to note here is that the bias is inherently connected to humans. However, with the fast-evolving technological paradigm, it is now interchangeably used for machines as well.?

(It will take another article to explain this conundrum of bias itself biased towards humans)

The big question then is, how can machines (or algorithms) be biased, it is such a human thing to do.?

The answer lies very subtly in the question itself and that is what gave birth to this article.

Before we move any further in this article, we have to agree on one important aspect. Human beings are naturally biased (either consciously or subconsciously) towards many things.?

A lot of times we cannot explain or identify the bias but we certainly cannot deny this fact.?


Few examples:

“I like strawberry icecreams and hate vanilla.”

“I think platinum rings look more beautiful than gold.”

“I prefer coffee in the morning and tea in the evening.”


You got the idea, I am not talking about the obvious biases here like race and gender but the more subtle ones. These kinds of biases work as an agent of a butterfly effect (looks small but is capable of creating significant impact) amplifying the bias. This is what we like to call “Real world bias”. At a granular level, it is practically unidentifiable and impossible to mitigate. You can read more about real-world bias in this great article by Rupa Singh.?


This real-world bias propagates in the dataset that gets accumulated over time and converted into data bias. Data bias is a reflection of real-world bias. It means that the data that is available is biased in some way or another. Note that the bias is still hidden under the layer of billions of data points. Until or unless we don’t put any extra effort to identify the bias in the dataset, it remains obscure from the human eye.


But unlike humans, machines are sufficiently capable of processing huge amounts of data. This is what machine learning algorithms and models do. They are designed to serve a specific purpose or solve business problems. To do so, these algorithms and models need data. It processes the data to find patterns in them and learn how to predict the outcome. This is where the bias in the data is exposed by Algorithms that eventually start influencing the outcome of the AI systems. (I will provide few real-world examples at the end in the further reading.)


Now, let’s take an example to understand how algorithmic bias fuels business bias. Consider an AI system that is designed to select candidates for army services based on their height. The underlying algorithm is trained on data that is biased toward male candidates in the Netherlands making the eligible height range 1.75 m to 1.85 m. However, the reality is different, the average height of males in the Netherlands is 1.84 m and females is 1.70 m. (As per world data info). When used, the output of this AI system will be completely biased and select only male candidates and women candidates will be rejected.?


If left unchecked, this AI system will go on and select only male candidates. The output of this system will also go as input and reinforce the same bias. This is a business bias at its worst.?


It may look outrageous and we probably file a petition to discard the AI system, but do you think the algorithm is the real culprit here? No. It is the data on which it is trained. The data that is collected in the real world. Data that majorly contains male candidates. Data that fueled this bias into the algorithm.?

(Note that I oversimplified the example for understanding purposes. Generally, AI systems consider millions of data points.)?


This is where the viscous cycle connects back to the source. Due to the bias in the decision system, female candidates are completely omitted from the selection process. This pattern impacts real-world bias, creating the illusion that female candidates are not fit for army service. With the obvious nature of human beings, we subconsciously created this bias which then reflected in machines and it stopped considering female candidates for any army service?


And the cycle repeats itself where snake keeps eating its tail.?


One would ask, enough with the problem, what’s the solution? How to break this never-ending cycle and break free the system once and for all. No more bias in the world.?


Before moving to possible solutions, let me clarify one thing. It is impossible to completely remove a bias in any system without removing humans from the face of Earth. We can reduce bias, identify it, and plan to mitigate it but it is a long journey.?


The solution is two-fold, identify the bias as early as possible and plan to reduce it.


It can be done effectively around algorithms, precisely from input and output of it. The first challenge is to identify the data bias before it is fed to the algorithms as training or test datasets. There are a plethora of tools and methods available that can be used to take up this task. However, these tools are generalized, you will need someone who understands the data domain and can write rules around which the bias can be identified. In our example, you want a data quality specialist who can create rules around possible bias parameters (for example, gender, age, and geographical location).?


Let’s say you identified a show-stopper bias in your dataset. What now? There are various methods to address it. Like, resampling the data, generating synthetic data, etc.?


These methods are easier said than done. Identifying a data bias and fixing it requires time, energy, and money. There’s another way, slow and steady but equally effective.?


This is the 2nd part of the two-fold problem, create guard rails on the output of the algorithm. It means that we should consciously accept (we identified it) that there is a bias in the training/test data and then create business guardrails to prevent the potential harm done by the algorithm output. In our example, we make sure that our AI systems come with a warning that it is trained and tested on male-dominated data from the Netherlands and suitable only for male candidates in Netherlands or similar geographical areas. It should not be used for female candidates at all and other legacy methods should be used for the selection of the female candidates.?


This can do wonders if done properly, I will tell you why. With the efficient guardrails in place, we will start injecting data into the real world that is less biased (more female candidates). This will eventually flow into our AI system and voila! Our algorithm will start doing course corrections. It will take time and require constant monitoring and nudging but it is effective.?


In conclusion, I want to emphasize only one key aspect to solve this problem - Empathy. If we are conscious, determined, and actively work towards removing bias then we will be able to do it eventually. All it takes is acceptance and deliverance.?


Further reading:


Moritz Hartmann

President, Global Head Roche Information Solutions | Leading digital health business

1 年

Thanks Adarsh for sharing this article and for the great work you and the team are doing in this field.

Melanie Poulin-Costello

Director Biostatistics at Roche

1 年

Great article! The more smart people talk about bias in AI, the farther we'll move the bias in our healthcare systems and our data.

Thanks Adarsh, very nicely explained that bias in AI models often results from bias inherent in the training data or as a result of inappropriate data selection for the intended application of the AI.

回复

Thanks Adarsh, great article. Despite some of the doom mongering about going on about AI elsewhere, this illustrates the potential for AI, developed and used in the right way, to act as an agent for social good and maybe reduce some of the already existing biases existing in society.

Vasant Gutte

Agile Transformation Coach at Roche; SAFe-SPC,SAFe-RTE, CAL-O, CAL-T, CAL-E, CSP-SM, A-CSM, CSM, SAFe-ASM, SAFe-SA, TKP, SFPC, ICP-ACC, PROSCI-CP

1 年

Very well written & narrated article, I like the way you set enough context at start about bias and the end is superb!

要查看或添加评论,请登录

Adarsh Srivastava的更多文章

社区洞察

其他会员也浏览了