登录查看更多内容

Main Challenges to Machine Learning

Pratyush Singh

Trainee Associate @Western Union | Ex-Intern @ Capgemini | Aspiring ML Engineer

发布日期: 2024年7月4日

Our main task in machine learning is to select a machine learning algorithm and train it using some data, So, the two things that can go wrong here is - "bad machine learning algorithm" and/or "bad data".

BAD DATA

Insufficient quantity of training data

Machine learning algorithms require thousands of examples for fairly simple problems, and for complex problems like image or speech recognition we may required millions of training examples.

Non-representative training data

It is important to use a training set that is representative of the cases we want to generalize to. If the sample is too small we may have non-representative data as a result of chance(called sampling noise) and even very large samples can be non-representative if the sampling method is flawed(called sampling bias).

It is crucial to look out for nonresponse bias ( happens when the individuals willing to take part in a research study are different than those who do not want to or are unable to take part in it) during sampling.

Poor quality data

If the training data is full of errors , missing values , outliers and noise , it will make it harder for the system to detect underlying patterns in the data during training and so the system might not perform well. It is often well worth the effort to spend time cleaning up the training data.

Irrelevant features

The machine learning system only learns if our training data contains enough relevant features and not many irrelevant ones. Coming up with a good set of features to train on is called feature engineering , and it involves:

Feature Selection
Feature Extraction (combing related features)
Creating new features (gather more data).

领英推荐

XGBOOST CLASSIFIER ALGORITHM IN MACHINE LEARNING

Shanti A 3 年前

The Complete Guide to Handling Missing Values in…

Didarul Islam 1 年前

Challenges in Implementing Machine Learning Projects

Deduce Technologies 2 年前

BAD ALGORITHM

Overfitting the training data

Overfitting - It means that the model performs well on the training data but does not generalize well on new instances.

Overfitting happens when the model is too complex relative to the amount and noisiness of the data and the model is learning patterns in the noise itself.

Possible solutions-

Simplify the model ( select a model with fewer parameters , reducing the number of attributes in the training data , constraining the model )
Gather more training data
Reduce the noise in training data ( fix errors , missing values , outliers , etc.).

Constraining the model to make it simpler and reduce the risk of overfitting is called Regularization. We need to find the right balance between fitting the training data perfectly & keeping the model simply enough to ensure that it generalizes well.

Underfitting the training data

Underfitting is the opposite of overfitting. It means that our model is too simple to learn the underlying patterns in the data.

Possible solutions-

Select a more powerful model (more parameters).
Feed better features to the learning algorithm.
Reduce the constraints on the model.

Reference Book - https://www.oreilly.com/library/view/hands-on-machine-learning/9781492032632/

Gunjan Yawalkar

Student at Dr.Vishwanath Karad MIT WORLD PEACE UNIVERSITY|PUNE

8 个月

Very informative

1 次回应

查看更多评论

要查看或添加评论，请登录

Pratyush Singh的更多文章

Types of Machine Learning Systems

2024年7月1日

Types of Machine Learning Systems

Machine Learning Systems could be of various types depending on the criteria we're using to classify those systems…
An Illustrative introduction to Transformers(Part 2/3)

2024年3月21日

An Illustrative introduction to Transformers(Part 2/3)

Link: Part 1 Hey everyone ! In the previous article , I introduced you to the transformer architecture, highlighting…
An Illustrative introduction to Transformers(Part 1/3)

2024年3月16日

An Illustrative introduction to Transformers(Part 1/3)

Link : Part 2 “Attention Is All You Need” was the name of the research paper that came out in 2017, revolutionizing the…

Main Challenges to Machine Learning

Pratyush Singh

Trainee Associate @Western Union | Ex-Intern @ Capgemini | Aspiring ML Engineer

BAD DATA

Insufficient quantity of training data

Non-representative training data

Poor quality data

Irrelevant features

领英推荐

BAD ALGORITHM

Overfitting the training data

Underfitting the training data

Pratyush Singh的更多文章

社区洞察

其他会员也浏览了

How I Fell In Love With Machine Learning

How does Machine Learning Work?

Some Statistical Operations For Machine Learning

Dimensionality Reduction in Machine Learning using PCA

Data Encoding in Machine Learning - Part 08

End-to-End Machine Learning Lifecycle

Dimension Reduction in Machine Learning. Why PCA?

Machine Learning Essentials (2 of 3)

A Look Into Snorkel DryBell: Google’s Machine Learning Model that Labels Data by Learning About Your Organization

Data Preparation and Algorithm Training in Machine Learning

BAD DATA

Insufficient quantity of training data

Non-representative training data

Poor quality data

Irrelevant features

领英推荐

BAD ALGORITHM

Overfitting the training data

Underfitting the training data

Pratyush Singh的更多文章

Types of Machine Learning Systems

An Illustrative introduction to Transformers(Part 2/3)

An Illustrative introduction to Transformers(Part 1/3)

社区洞察

其他会员也浏览了

How I Fell In Love With Machine Learning

How does Machine Learning Work?

Some Statistical Operations For Machine Learning

Dimensionality Reduction in Machine Learning using PCA

Data Encoding in Machine Learning - Part 08

End-to-End Machine Learning Lifecycle

Dimension Reduction in Machine Learning. Why PCA?

Machine Learning Essentials (2 of 3)

A Look Into Snorkel DryBell: Google’s Machine Learning Model that Labels Data by Learning About Your Organization

Data Preparation and Algorithm Training in Machine Learning