登录查看更多内容

Bias In Artificial Intelligence

Carlos Lara

Senior Software Engineer | AWS | Go | Certified Kubernetes Administrator (CKA)

发布日期: 2020年2月7日

You may have heard the term "bias" in artificial intelligence. It usually refers to machine learning algorithms that make biased predictions.

Biased predictions are a sign of underperforming machine learning models that were not trained with the proper datasets.

Most people know that the performance of a machine learning model is directly proportional to the quantity and quality of the dataset used to train it.

Quantity of data is intuitive and self-explanatory. The more data we have, the better the performance of the models because they learn from more examples. This allows machine learning models to generalize well once deployed in production.

The quality of the dataset refers to:

How representative is the dataset of the real-life situations the ML model will encounter.
In the case of classification datasets, whether binary or multi-class, the categories are evenly distributed to include equal (or near equal) amounts of each example.
Having enough variation of examples, including edge cases (these are extremely common in production).
In the practical case of supervised learning, making sure the data has been labeled properly and correctly (humans even have their own bias during labeling).
Having all the features necessary to correctly address the business problem you are trying to solve, while eliminating redundant/unnecessary features (we call this feature engineering).

You also want to consider the freshness and temporal relevance of the data. If you are using historical data to make predictions about the future, make sure the historical factors/features are still relevant today and for the coming months.

If either the business problem or dataset changes over time, you need to account for this within your machine learning process to make sure the models maintain acceptable performance over time.

All of these quality factors of a given dataset greatly influence the performance of machine learning models.

In practice, bias in artificial intelligence originates from low quality training datasets, and it usually involves issues with one or all of the factors mentioned above.

The most common factor leading to biased machine learning models is 2. The different categories within a training dataset are simply not evenly distributed, and the machine learning model learns this skewed distribution.

For example, suppose you have a dataset of 1,000 images labeled as either dog or cat. 900 images are of dogs and 100 images are of cats.

During training, the machine learning model will learn the characteristics/features of dogs a lot more than those of cats. Therefore, in its own little universe, the model will be biased towards identifying everything as a dog, including cats, because it's seen a lot more of that category than the other. This extends to multi-class classification, as well.

Simply put:

Uneven distribution of categories in the training dataset = biased ML model

Here are some examples of biased machine learning models in practice:

Self-driving vehicles being biased towards identifying certain demographics vs others.
HR/recruitment algorithms being biased towards 'selecting' certain applications vs others.
Speech recognition algorithms being biased towards identifying certain accents vs others.
Custom computer vision models being biased towards identifying certain objects (in certain locations) vs others.

Whether you are an AI/ML product manager or VP of AI/ML Products, make sure you and your team are on the lookout for sources of bias within datasets. A healthy level of paranoia is helpful because these models will (hopefully) end up in production, affecting users, customers, clients, internal stakeholders, and the overall business. Over time, they may also affect your overall industry/sector.

Make sure you address bias and dataset quality upfront before getting too deep into AI/ML product development. Iterate and test constantly with production data to squeeze out the performance blind spots in your models.

When you identify the scenarios where your model is not performing well, incorporate more of those examples within your training dataset. Again, iterate, test, and improve until you hit your target KPIs.

If you need help to accelerate your company's machine learning efforts, or if you need help getting started with enterprise AI adoption, send me a LinkedIn message or email me at [email protected] and I will be happy to help you.

Subscribe to my blog to get the latest tactics and strategies to thrive in this new era of artificial intelligence.

Subscribe to my YouTube channel for business AI video tutorials and technical hands-on tutorials.

Client case studies and testimonials: https://carloslaraai.com/enterprise-case-studies/

Follow me for more content: linkedin.com/in/CarlosLaraAI

#ai #career #artificialintelligence #machinelearning #deeplearning #datascience #business #enterprise #leadership #careers #aicareer #aiadoption #projectmanagement #productmanagement

Bianca Minnaar

Senior Software engineer

5 年

Absolutely love this piece. Thank you For a well educated and researched article. Viewing my experience in parallel with A sober and insightful narrative seems surreal. I knew from the Start models were the building blocks for AI. We can almost refer to This as modular behavior. AI will exhibit additional behavior sets as soon as model selection algorithms mature. Source input will always be A determining factor and focusing on good in good out from a selection point of view is crucial. Data analysis and Edge case engineering in design phases will be the cornerstones for decision making. Understanding the Implementation environment requires both broad and Detailed insight. Business rules algorithms can be taught to facilitate complex business environments and As these environments gather additional input the AI can improve as decision models update in real time with analysis algorithms to influence outcomes.

1 次回应

Carlos Lara

Senior Software Engineer | AWS | Go | Certified Kubernetes Administrator (CKA)

5 年

Ankit Aggarwal?Based on our discussion about AI/ML bias, this article may be a helpful reference!

Carlos Lara

Senior Software Engineer | AWS | Go | Certified Kubernetes Administrator (CKA)

5 年

Link to the article:?https://www.dhirubhai.net/pulse/bias-artificial-intelligence-carlos-lara

查看更多评论

要查看或添加评论，请登录

Carlos Lara的更多文章

Centralized Feature Engineering With SageMaker Feature Store

2022年1月4日

Centralized Feature Engineering With SageMaker Feature Store

Can we guarantee that training and inference pipelines are ingesting the same data? Not only in terms of the source…

1 条评论
Test-Driven Development For Feature Engineering Microservices

2022年1月1日

Test-Driven Development For Feature Engineering Microservices

How do we know for sure that our machine learning pipelines consistently produce the datasets we expect for model…

1 条评论
Null Imputation Bias and Fairness for Production ML Solutions

2021年12月31日

Null Imputation Bias and Fairness for Production ML Solutions

Minimizing bias and maximizing fairness are vital elements of production machine learning solutions. After all, one of…

9 条评论
Continuous Training of Machine Learning Models in Production

2021年12月29日

Continuous Training of Machine Learning Models in Production

Is continuous training (CT) a machine learning operations (MLOps) best practice? It depends on what we mean by CT…

6 条评论
Unit Testing Data Validation Microservices for Production ML Pipelines

2021年12月25日

Unit Testing Data Validation Microservices for Production ML Pipelines

Unit testing is a vital element of production software engineering. After all, how do we know for sure that our code…

4 条评论
Testing ML Microservices for Production Deployments

2021年12月19日

Testing ML Microservices for Production Deployments

How do we ensure machine learning pipeline components produce the exact result we expect, especially prior to…

2 条评论
How To Drive Revenue Growth Through Production ML Solutions

2021年12月11日

How To Drive Revenue Growth Through Production ML Solutions

For any organization, 20% of the AI/ML use cases drive 80% of the business value. How do we identify this 20%? Always…

11 条评论
3 Degrees of Automation for Production Machine Learning Solutions

2021年11月30日

3 Degrees of Automation for Production Machine Learning Solutions

Have you released a machine learning solution to production, only to find yourself pulling KPI metrics manually every…

4 条评论
How To Deploy Serverless Containers For ML Pipelines Using ECS Fargate

2021年11月22日

How To Deploy Serverless Containers For ML Pipelines Using ECS Fargate

"Should we use Kubernetes or go serverless first for new software solutions?" This is a common question among…

2 条评论
5 Pillars of Architecture Design for Production ML Software Solutions

2021年11月15日

5 Pillars of Architecture Design for Production ML Software Solutions

Creating a machine learning software system is like constructing a building. If the foundation is not solid, structural…

2 条评论

See all articles

Bias In Artificial Intelligence

Carlos Lara

Senior Software Engineer | AWS | Go | Certified Kubernetes Administrator (CKA)

Carlos Lara的更多文章

社区洞察

其他会员也浏览了

Machine Learning

A 6 step approach to building an ML/AI Neuralnet Algorithm usingFuzzy Logic, Fractals and quantum-inspired probabilisty for an AI with Imagination.

Three Major Misconceptions About ML to Address Before It Gets Pretty Dense

Scaling Techniques in Machine Learning: A Beginner's Guide

AI Algorithms: Deep Dive into Gradient Boosting Machines (GBM) for enterprises

Artificial Intelligence No 52: An introduction to causal machine learning

Mastering the Machine Learning Journey: Navigating the Algorithm Selection Sea #Stage3

Regularization

The Journey of Optimizing a Machine Learning Model: A Tale of Techniques and Trials

What is machine learning and how does it work?

Carlos Lara的更多文章

Centralized Feature Engineering With SageMaker Feature Store

Test-Driven Development For Feature Engineering Microservices

Null Imputation Bias and Fairness for Production ML Solutions

Continuous Training of Machine Learning Models in Production

Unit Testing Data Validation Microservices for Production ML Pipelines

Testing ML Microservices for Production Deployments

How To Drive Revenue Growth Through Production ML Solutions

3 Degrees of Automation for Production Machine Learning Solutions

How To Deploy Serverless Containers For ML Pipelines Using ECS Fargate

5 Pillars of Architecture Design for Production ML Software Solutions

社区洞察

其他会员也浏览了

Machine Learning

A 6 step approach to building an ML/AI Neuralnet Algorithm usingFuzzy Logic, Fractals and quantum-inspired probabilisty for an AI with Imagination.

Three Major Misconceptions About ML to Address Before It Gets Pretty Dense

Scaling Techniques in Machine Learning: A Beginner's Guide

AI Algorithms: Deep Dive into Gradient Boosting Machines (GBM) for enterprises

Artificial Intelligence No 52: An introduction to causal machine learning

Mastering the Machine Learning Journey: Navigating the Algorithm Selection Sea #Stage3

Regularization

The Journey of Optimizing a Machine Learning Model: A Tale of Techniques and Trials

What is machine learning and how does it work?