What Most People Get Wrong about AI & Bias
When I taught and created content for a course in the NYU SPS Human Capital Management program, I wanted to help graduate students prepare for what it means to understand, evaluate, and implement rapid changing technology inside of organizations. One of the most common misconceptions surrounding the topic is how AI algorithms encode and perpetrate cultural biases. Unsurprisingly, it is a very complex problem.?
But first, what is Machine Learning??
Machine Learning (ML) is a field of Artificial Intelligence (AI) where computers learn to make predictions or decisions from data without being explicitly programmed, using algorithms and statistical models. It enables computers to improve their performance on tasks as they gain experience from the data they process.
A fallacy I often hear in companies is, “AI or ML models are biased."
Well, not exactly. Often, headlines refer to bias in AI in a vague and uncertain way. In the famous case of Google, where in 2015 software engineer Jacky Alciné pointed out that the image recognition algorithms in Google Photos were classifying his black friends as “gorillas.” To summarize this occurrence, the algorithm that Google created did not know how to identify people or animals correctly, resulting in a devastating outcome. (Should Google have tested this, yes. Was the actual algorithm biased, no. We’ll explain).?
First, you need to understand how ML models learn and how data scientists or engineers “train them”:
As you can see, there are many phases where things can go awry . . .
Think of it like teaching a computer to recognize pictures of cats. You gather cat pictures, make sure they're clear, split them into learning, practice, and testing groups, pick a good way for the computer to learn, let it study, and see how well it can identify cats in new pictures. If it's good, you can use it to find cats in photos. And you need to check if it's still good at it as time goes on.
Let's review another famous case where things went terribly wrong. Amazon had invested in building their own recruiting AI tool, only to find out that the algorithm learned that women often did not get the job in the male dominated technology industry. Therefore, it started to give female applicants lower scores, which often removed them from the process altogether.?
Often, people confuse the teams who built them or the algorithm itself with the biased patterns that the algorithm begins to display in the outcomes, patterns or predictions it suggests. Algorithms in Machine Learning and AI models are not in and of themselves biased, they learn from patterns where many times, the insights and patterns reveal that there is a bias in the insight or a mistake in the calculation.
In the case of Amazon, it learned a biased pattern, where female applicants statistically were unlikely to get a job offer. In the case of Google, the algorithm incorrectly categorized attributes as humans or animal, which was disturbing, but this was not a pattern that it was trained on, the machine made assumptions that the engineers were unaware of until tested.?
But looking at these two examples, it’s important to understand, how do AL or ML models work? Let us explain in a more specific way, as understanding the anatomy of these algorithm is important to have critiques of where AI or ML can present bias and ethical dilemmas:
Two of most commonly used model “types” used today are:
(1) Neural network models: these types of algorithms perform best with unstructured data; i.e. image? recognition and classifying images into categories, i.e. pictures of handwritten numbers and classifying them into the correct respective categories.?
Neural networks are a class of machine learning algorithms inspired by the structure and functioning of the human brain's neural networks. They consist of interconnected nodes, also known as artificial neurons or units, organized in layers: inspired by biological neural networks, where the behavior of neurons are controlled using many parameter. Each connection between nodes has an associated weight, and these weights are adjusted during the training process to optimize the network's performance. Neural networks are good at handling complex and non-linear relationships in data. They excel in tasks such as:
领英推荐
The complexity with neural networks is that they will come to a decision about something, i.e. classification of an image, i.e. categorizing a picture, but scientists and engineers do not always know why this decision was made. While scientists and mathematicians who create these algorithms understand the actual quantification and calculations, when the algorithms branch and make a decision about an image for example, it is really difficult to know when, where and why the algorithm made the decision it did. Hence, neural networks are more prone to making decisions and recognizing patterns that can result in a bias– making them more complex and thus, potentially dangerous.?
(2) Decision Tree models: this type uses structured data, a common example would be identifying home prices by combining labeled attributes to recognize patterns that result in price, for example,? location, square footage, age of home, etc.?
For example, XGBoost (Extreme Gradient Boosting) is a powerful and widely used machine learning algorithm designed to solve various types of problems, particularly structured tabular data. XGBoost builds a strong predictive model by combining the predictions of multiple weaker models, typically decision trees.
Applications of XGBoost are particularly effective with structured data, where features have well-defined meanings and relationships. This makes it a popular choice for many Kaggle competitions and real-world business problems involving tabular data.
With the examples of Google and Amazon, you can see that the google image example was most likely using a neural network algorithm and Amazon was most likely using an XGBoost algorithm (just my guess), both presented a devastating outcome, but for different reasons.?
As business leaders, it is vital that we know enough detail in order to diagnose, understand and prevent these outcomes from impacting humans, their careers, hiring decisions etc.?
How to prevent bias in your organization’s algorithms:
Additional Resources on AI and Ethics:
A special thank you to Anna A. Tavis, PhD for many conversations, support, and mentoring over the years as I've taught courses in the program. Additional thank you to Beverly Tarulli, Ph.D. for her curriculum review and mentorship.
*This article only represents my personal thoughts and perspective.
#PeopleAnalyticsforHR | Connecting the dots between HR and data/AI
1 年Nice way to simplify a very technical topic. I particularly liked number 2 under preventing bias in your organization. Vendors will often try to avoid giving specifics about how their model is trained in fear of leaking IP. But, if a company is going to use their technology to make decisions about humans, decisions that can impact their career, salary, etc., it is their responsibility to understand how the model was trained and how it works. And, even if a company is satisfied with the vendor's answer, the company should review the output from their own data and again ask to understand how each variable is weighed, using human experience to gauge if the model is making decisions in line with the companies priorities, values, etc. In the end, what is going to set one vendor apart from another is less about the model and more about the product design and user experience. (Though of course if it's a bad model...it doesn't matter how pretty or user friendly it is.)
HR Industry Analyst. I help Fortune 500 leaders understand and act on what's happening today and what's coming in the near future.
1 年Excellent article, Donna!
Workforce Strategy | Leadership Development | Change Management | Stakeholder Management
1 年Donna, fantastic read. Thanks for sharing. Your article makes complex model training approaches easier to understand. Yes, insight into the training datasets and the training approach is a step to avoiding biased outcomes. The creative community has been very vocal about including their work without permission in those datasets. As their voices grow louder and courts start to weigh in, that could bring us closer to dataset transparency. As I think about training datasets, I also wonder about the impact of using synthetic data on the bias discussion. I'm staying tuned to see how this develops.
Such a prevalent conversation these days! Thank you, Donna, for breaking it down and dispelling some misconceptions.
People Analytics | HR Talent Partner | DEI Strategy
1 年Great article! I'm wondering how/if third party tech products will address questions on datasets and training to better identify bias vs. keeping their processes vague and undisclosed with the goal of staying competitive.