登录查看更多内容

Secret to Dealing with Missing Data

Tahir Raza

AI | Machine Learning Engineer | Data Scientist | Drive innovation through advanced AI solutions

发布日期: 2023年7月4日

+ 关注

Have you ever faced the problem of missing data in your data science or machine learning projects?

I’m sure you have, because it’s a very common issue.

And you know what?

It can really mess up your results if you don’t deal with it properly.

That’s why I want to tell you about a cool technique called Predictive Mean Matching (PMM).

It’s a way of filling in the gaps in your data without introducing too much bias.

PMM is like a smart guesser. It looks at the other data you have and tries to figure out what the missing value should be.

But instead of using its own guess, it uses a real value that’s already in the data and that’s close to its guess.

This way, the filled-in values make sense and the shape of your data doesn’t change.

Example: Missing Citizen Data ??

In this dataset, we have three variables?Age,?Income, and Education, but some of the values are missing.

So, we’ll use a tool from sklearn.impute called IterativeImputer. It’s a smart tool that can fill in the missing values by looking at the other variables and using a formula. We’ll use LinearRegression as our formula.

Look at the output. Do you see how the missing values in the Age, Income, and Education columns are gone?

They have been replaced with new values that make sense. These values are not just the average of the column but are guessed values based on the other columns.

领英推荐

NEW from Maven Analytics on Medium!

Maven Analytics 1 年前

Driving Data Science Initiative: a Simple Four-Stage…

武攀 2 年前

Data Science Unveiled: The New Age of Data-Driven…

Noorain Fathima 6 个月前

This is what PMM does - it uses the connections between variables to make better guesses.

Key Ideas ??

Let me tell you some important things about missing data and how to deal with it.

Missing data is a big problem in data science and machine learning. It can make your results wrong or misleading. So, you need to fix it the right way.

One way to fix it is to use PMM. It’s a technique that guesses the missing values by looking at the other data you have. But it doesn’t use its own guesses. It uses real values that are similar to its guesses.

This is good when your data is not smooth and symmetrical, like when it has outliers or skewness. PMM can keep the shape of your data the same.

But you need to understand what PMM does and how it affects your analysis. You need to look at the new values and how they compare to the old values.

PMM can handle these situations better than other methods, like dropping rows or using the average.

But don’t get me wrong, PMM is not magic.

You still need to understand your data and why some values are missing.

And you need to pay attention to the output of PMM and how it affects your analysis.

So, what do you think?

Are you interested in trying out PMM for your next project? ??

Juji, Inc.

1 年

Tahir Raza Thanks for Sharing! ?

要查看或添加评论，请登录

Tahir Raza的更多文章

Transform Your Data Like a Pro

2023年7月10日

Transform Your Data Like a Pro

Are you tired of spending hours cleaning and transforming data for your analysis? Well, let me tell you about this…
Darts: A Versatile Tool for Time Series Modeling

2023年6月26日

Darts: A Versatile Tool for Time Series Modeling

Have you ever found yourself struggling with time series forecasting and anomaly detection? It can be a real pain…
From Chaos to Clarity: How Klib Transforms Data Cleaning into an Adventure

2023年6月10日

From Chaos to Clarity: How Klib Transforms Data Cleaning into an Adventure

Data cleaning is a crucial step in data science, but it can be time-consuming and challenging. Luckily, there's a…
Unleashing the Power of Machine Learning and Deep Learning: Embarking on an Extraordinary Journey

2023年6月6日

Unleashing the Power of Machine Learning and Deep Learning: Embarking on an Extraordinary Journey

In the vast landscape of technology, where possibilities seem boundless, two extraordinary forces have emerged: 1…

Secret to Dealing with Missing Data

Tahir Raza

AI | Machine Learning Engineer | Data Scientist | Drive innovation through advanced AI solutions

Example: Missing Citizen Data ??

领英推荐

Key Ideas ??

Tahir Raza的更多文章

社区洞察

其他会员也浏览了

From Raw Data to Actionable Insights: The Role of Preprocessing and Cleaning

Future of Data and Data Driven Decision Making (DDDM)

Life Is 10% What You Make It, 90% How You Take It: Data Science Perspective

Reasoning on Linked Data Graphs – Part I - Academic Dream or Mission Possible? Thoughts on “killer applications”

Data Cleaning - Filter

Understanding the Z-Test and T-Test: Key Tools for Statistical Inference in Data Science

Pandas Series: Your First Step to Efficient Data Handling

The Essential Guide to Data Cleaning and Preprocessing with Pandas

Unveiling the Magic: How Statistics makes Machines Learn on Structured Data

The Data Scientist's Prayer: Finding Humour and Insight in the World of Data

Example: Missing Citizen Data ??

领英推荐

Key Ideas ??

Tahir Raza的更多文章

Transform Your Data Like a Pro

Darts: A Versatile Tool for Time Series Modeling

From Chaos to Clarity: How Klib Transforms Data Cleaning into an Adventure

Unleashing the Power of Machine Learning and Deep Learning: Embarking on an Extraordinary Journey

社区洞察

其他会员也浏览了

From Raw Data to Actionable Insights: The Role of Preprocessing and Cleaning

Future of Data and Data Driven Decision Making (DDDM)

Life Is 10% What You Make It, 90% How You Take It: Data Science Perspective

Reasoning on Linked Data Graphs – Part I - Academic Dream or Mission Possible? Thoughts on “killer applications”

Data Cleaning - Filter

Understanding the Z-Test and T-Test: Key Tools for Statistical Inference in Data Science

Pandas Series: Your First Step to Efficient Data Handling

The Essential Guide to Data Cleaning and Preprocessing with Pandas

Unveiling the Magic: How Statistics makes Machines Learn on Structured Data

The Data Scientist's Prayer: Finding Humour and Insight in the World of Data