登录查看更多内容

Solving Class Imbalance: Techniques and Strategies

Avnish patel

Computer Vision Engineer, Medtronic | Deep learning | Computer vision

发布日期: 2023年5月7日

Hey everyone, in my previous post about - Unleashing the Power of Data, I talked about how data imbalance can severely impact the accuracy of our models. In this article, I'll dive into the technical side of things and share how I tackled this issue.

No alt text provided for this image — Dataset Info

So, as we saw in the last post, the heavy-plastic class had four times more images than the no-image data. And when I trained my model using this set, it was biased towards the majority class, resulting in a measly 50% accuracy. But, I didn't give up!

After doing some research, I came across a combo of techniques that proved to be super effective - SubsetRandomSampler and Class Weights. Let me break it down for you.

SubsetRandomSampler is a PyTorch utility that lets you create a random subset of a dataset. It shuffles the list of indices and returns a subset of those indices. So, we can create a random subset of our imbalanced data with ease.

Next up, we calculate the class weights for our dataset. How? By counting the number of examples in each class, and then computing the inverse frequency of each class. These weights are then passed to the loss function using the 'weight' argument.

Now, we can pass our SubsetRandomSampler to the PyTorch DataLoader, which loads the data in batches during training. And since we have class weights in place, our model pays more attention to the minority class during training.

领英推荐

MLflow: a better way to track your models

Deena Gergis 3 年前

The class weights ensure that the model pays more attention to the minority class, while the SubsetRandomSampler ensures that the model sees a variety of examples during training.

So, what did all of this do for my model? Well, my accuracy skyrocketed from 50% to 80%! And, it even generalized well on unseen data. Pretty cool, huh?

Alright, that's all for now, folks. Stay tuned for my next post, where I'll be sharing more on model selection and building. Cheers!"

要查看或添加评论，请登录

Avnish patel的更多文章

History Subject - A blank to be filled

2020年9月13日

History Subject - A blank to be filled

Going back to our School days,there was a subject named "History" which many of us found it boring and sometimes we…
Pet Defecation -A Rising Issue

2020年9月3日

Pet Defecation -A Rising Issue

Before moving towards the headline for today ,I would like to share something like a short version of the events that…

Solving Class Imbalance: Techniques and Strategies

Avnish patel

Computer Vision Engineer, Medtronic | Deep learning | Computer vision

领英推荐

Avnish patel的更多文章

社区洞察

其他会员也浏览了

Look-ahead bias

Support Vector Machine- Simple analysis

Why is it called Support Vector Machine(SVM)?

Machine Learning in R

Support Vector Machines: Harnessing the Power of Margins

Day 7: k-Nearest Neighbors (k-NN)

Random Forest

S3: Episode 6: K-Nearest Neighbors (KNN) Algorithm

Support Vector Machine — an overview

A Bit on "Missing Values" and "Imputation" in Machine Learning

领英推荐

Avnish patel的更多文章

History Subject - A blank to be filled

Pet Defecation -A Rising Issue

社区洞察

其他会员也浏览了

Look-ahead bias

Support Vector Machine- Simple analysis

Why is it called Support Vector Machine(SVM)?

Machine Learning in R

Support Vector Machines: Harnessing the Power of Margins

Day 7: k-Nearest Neighbors (k-NN)

Random Forest

S3: Episode 6: K-Nearest Neighbors (KNN) Algorithm

Support Vector Machine — an overview

A Bit on "Missing Values" and "Imputation" in Machine Learning