登录查看更多内容

?Naive Bayes Algorithm - Explained??

Simranjeet Singh

Senior Data Scientist | Ex-TCS DIGITAL, EXL | GenAI and LLM Practitioner | Domain - Finance (Lending and Credit Risk) | Tech YouTuber | Medium Blogger | Making Impact Through Data-Driven Solutions | Φ2 = Φ + 1

发布日期: 2021年11月26日

Naive?Bayes?is?a?probabilistic?algorithm?that’s?typically?used?for?classification?problems. It uses Conditional probability, which is a measure of the probability of an event occurring given that another event has (by assumption, presumption, assertion, or evidence) occurred. It?is?simple,?intuitive,?and?yet?performs?surprisingly?well?in?many?cases. It?is based?on?Bayes’?Theorem?with?an?assumption?of?independence?among?predictors.?In?simple?terms,?a?Naive?Bayes?classifier?assumes?that?the?presence?of?a?particular?feature?in?a?class?is?unrelated?to?the?presence?of?any?other?feature.

Assumptions?made?by?Naive?Bayes

The?fundamental?Na?ve?Bayes?assumption?is?that?each?feature?makes?an:

-?Independent

-?Equal

contribution?to?the?outcome.

For?example,?a?fruit?may?be?considered?to?be?an?apple?if?it?is?red,?round,?and?about?3?inches?in?diameter.?Even?if?these?features?depend?on?each?other?or?upon?the?existence?of?the?other?features,?all?of?these?properties?independently?contribute?to?the?probability?that?this?fruit?is?an?apple?and?that?is?why?it?is?known?as?‘Naive’.

Note - Naive?Bayes?model?is?easy?to?build?and?particularly?useful?for?very?large?data?sets.?Along?with?simplicity,?Naive?Bayes?is?known?to?outperform?even?highly?sophisticated?classification?methods.

Bayes?theorem?provides?a?way?of?calculating?posterior?probability?P(c|x)?from?P(c),?P(x),?and?P(x|c).?Look?at?the?equation?below:

Above,

-?P(c|x)?is?the?posterior?probability?of?class?(c,?target)?given?predictor?(x,?attributes).

-?P(c)?is?the?prior?probability?of?class.

-?P(x|c)?is?the?likelihood?which?is?the?probability?of?predictor?given?class.

-?P(x)?is?the?prior?probability?of?predictor.

This?is?a?rather?simple?transformation,?but?it?bridges?the?gap?between?what?we?want?to?do?and?what?we?can?do.?We?can’t?get?P(C|X)?directly,?but?we?can?get?P(X|C)?and?P(C)?from?the?training?data.?Here’s?an?example:

In?this?case,?X?=(Outlook,?Temperature,?Humidity,?Windy),?and?Y=Play.?P(X|Y)?and?P(Y)?can?be?calculated:

Having?this?amount?of?parameters?in?the?model?is?impractical.?To?solve?this?problem,?a?naive?assumption?is?made.?We?pretend?all?features?are?independent.?What?does?this?mean?

Now?with?the?help?of?this?naive?assumption?(naive?because?features?are?rarely?independent),?we?can?make?classification?with?much?fewer?parameters:

This?is?a?big?deal.?We?changed?the?number?of?parameters?from?exponential?to?linear.?This?means?that?Naive?Bayes?can deal with?high-dimensional?data?well.

Another Example with Mathematics -

Problem:?Players?will?play?if?the weather?is?sunny.?Is?this?statement?is?correct?

领英推荐

Choosing the Right Machine Learning Algorithm: A…

Doug Rose 1 个月前

Deep Dive: K-means Clustering

ScaleBuild AI 1 年前

Exploring the Power of Random Forest: From Decision…

Dr. Vivek Pandey 1 年前

We?can?solve?it?using?above?discussed?method?of?the posterior?probability.

-?P(Yes?|?Sunny)?=?P(?Sunny?|?Yes)?*?P(Yes)?/?P?(Sunny)

-?Here?we?have?P?(Sunny?|Yes)?=?3/9?=?0.33,?P(Sunny)?=?5/14?=?0.36,?P(?Yes)=?9/14?=?0.64

-?Now,?P?(Yes?|?Sunny)?=?0.33?*?0.64?/?0.36?=?0.60,?which?has?higher?probability.

Naive?Bayes?uses?a?similar?method?to?predict?the?probability?of?different?classes?based?on?various?attributes.?This?algorithm?is?mostly?used?in?text?classification?and?with?problems?having?multiple?classes.

Na?ve Bayes Classifier assumes that all the features are unrelated to each other. The presence or absence of a feature does not influence the presence or absence of any other feature.

In real-world datasets, we test a hypothesis given multiple evidence on features. So, the calculations become quite complicated. To simplify the work, the feature independence approach is used to uncouple multiple pieces of evidence and treat each as an independent one.

The zero-frequency problem

One of the disadvantages of Na?ve-Bayes is that if you have no occurrences of a class label and a certain attribute value together then the frequency-based probability estimate will be zero. And this will get a zero when all the probabilities are multiplied.

Solution - An approach to overcome this ‘zero-frequency problem’ in a Bayesian environment is to add one to the count for every attribute value-class combination when an attribute value doesn’t occur with every class value.

There are three types of Naive Bayes model under the sci-kit-learn library:

Gaussian:?It is used in classification and it assumes that features follow a normal distribution.

Multinomial:?It is used for discrete counts. For example, let’s say, we have a text classification problem. Here we can consider Bernoulli trials which is one step further and instead of “word occurring in the document”, we have “count how often word occurs in the document”, you can think of it as “number of times outcome number x_i is observed over the n trials”.

Bernoulli:?The binomial model is useful if your feature vectors are binary (i.e. zeros and ones). One application would be text classification with a ‘bag of words’ model where the 1s & 0s are “word occurs in the document” and “word does not occur in the document” respectively.

What are the Pros and Cons of Naive Bayes?

Pros:

It is easy and fast to predict the class of test data set. It also performs well in multi-class prediction
When the assumption of independence holds, a Naive Bayes classifier performs better compared to other models like logistic regression and you need less training data.
It performs well in the case of categorical input variables compared to a numerical variable(s). For numerical variables, the normal distribution is assumed (bell curve, which is a strong assumption).

Cons:

If a categorical variable has a category (in the test data set), which was not observed in the training data set, then the model will assign a 0 (zero) probability and will be unable to make a prediction. This is often known as “Zero Frequency”. To solve this, we can use the smoothing technique. One of the simplest smoothing techniques is called Laplace estimation.
On the other side, naive Bayes is also known as a bad estimator, so the probability outputs from predict_proba are not to be taken too seriously.
Another limitation of Naive Bayes is the assumption of independent predictors. In real life, it is almost impossible that we get a set of predictors which are completely independent.

Tips to improve the power of the Naive Bayes Model

Here are some tips for improving the power of the Naive Bayes Model:

If continuous features do not have a normal distribution, we should use transformation or different methods to convert it into a normal distribution.
If a test data set has zero frequency issue, apply smoothing techniques “Laplace Correction” to predict the class of test data set.
Remove correlated features, as the highly correlated features are voted twice in the model and it can lead to overinflating importance.
Naive Bayes classifiers have limited options for parameter tuning like alpha=1 for smoothing, fit_prior=[True|False] to learn class prior probabilities or not and some other options (look at the detail here). I would recommend focusing on your pre-processing of data and the feature selection.
You might think to apply some classifier combination techniques like ensembling, bagging, and boosting but these methods would not help. Actually, “ensembling, boosting, bagging” won’t help since their purpose is to reduce variance. Naive Bayes has no variance to minimize.

Applications of Naive Bayes Algorithms

Real-time Prediction:?Naive Bayes is an eager learning classifier and it is sure fast. Thus, it could be used for making predictions in real-time.

Multi-class Prediction:?This algorithm is also well known for its multi-class prediction feature. Here we can predict the probability of multiple classes of target variables.

Text classification/ Spam Filtering/ Sentiment Analysis:?Naive Bayes classifiers mostly used in text classification (due to better results in multi-class problems and independence rule) have a higher success rate as compared to other algorithms. As a result, it is widely used in Spam filtering (identify spam e-mail) and Sentiment Analysis (in social media analysis, to identify positive and negative customer sentiments)

Recommendation System:?Naive Bayes Classifier and Collaborative Filtering together build a Recommendation System that uses machine learning and data mining techniques to filter unseen information and predict whether a user would like a given resource or not.

Thanks for Reading, Like Comment and Sharing if it's good.

Gowri Swaminathan

Director (IT)/ Scientist 'E' at National Informatics Centre, MeitY

3 年

A good and neatly explained article . thanks

1 次回应

Pablo Gaston Schulz

Engineering Manager, Fertilizer Division

3 年

Very clear explanation. Thanks for sharing

1 次回应

查看更多评论

要查看或添加评论，请登录

Simranjeet Singh的更多文章

Big Data: The Power of Big Data: How Large Datasets Are Driving Innovation and Improvement

2023年1月13日

Big Data: The Power of Big Data: How Large Datasets Are Driving Innovation and Improvement

Big data is a buzzword that you’ve probably heard a lot lately, but what does it really mean and why is it so…
Maximizing the Impact of Machine Learning with MLOps: Best Practices and Challenges

2022年12月31日

Maximizing the Impact of Machine Learning with MLOps: Best Practices and Challenges

What is MLOPS? Machine learning operations, or MLOps, is a set of practices and tools that aim to streamline the…
Support Vector Machines [Explained]

2021年11月21日

Support Vector Machines [Explained]

The objective of the support vector machine algorithm is to find a hyperplane in N-dimensional space(N — the number of…
?[Explained] Regularization in Machine Learning

2021年10月27日

?[Explained] Regularization in Machine Learning

Why we need Regularization Algorithms? When building Multi-linear Regression model for data, we mostly used…

1 条评论
?Logistic Regression - Explained??

2021年10月21日

?Logistic Regression - Explained??

When data scientists may come across a new classification problem, the first algorithm that may come across their mind…
? Classification - Confusion Matrix Explained

2021年10月15日

? Classification - Confusion Matrix Explained

Machine Learning Problems often ask for Confusion Matrix. So in this article Confusion Matrix is explained with…
??Linear Regression - Python [Scratch]

2021年10月9日

??Linear Regression - Python [Scratch]

Welcome to Top Machine Learning Algorithms Series - #100DaysofMachineLearning What are Regression Algorithms?…
Trending Innovations in Data Analytics 2019

2019年5月8日

Trending Innovations in Data Analytics 2019

1. Blockchain and Predictive Analytics - Blockchain is a digital and decentralized public ledger with a system that…
Machine Learning - Fun Way to Programming

2018年8月28日

Machine Learning - Fun Way to Programming

Today, Machine Learning is influence all the world with its abundant features. It is going to take over the repeatable…

4 条评论

See all articles

?Naive Bayes Algorithm - Explained??

Simranjeet Singh

Senior Data Scientist | Ex-TCS DIGITAL, EXL | GenAI and LLM Practitioner | Domain - Finance (Lending and Credit Risk) | Tech YouTuber | Medium Blogger | Making Impact Through Data-Driven Solutions | Φ2 = Φ + 1

Assumptions?made?by?Naive?Bayes

领英推荐

The zero-frequency problem

There are three types of Naive Bayes model under the sci-kit-learn library:

What are the Pros and Cons of Naive Bayes?

Tips to improve the power of the Naive Bayes Model

Applications of Naive Bayes Algorithms

Simranjeet Singh的更多文章

社区洞察

其他会员也浏览了

Quick question from data science and machine learning interview | Part 5

Demystifying Mathematical Models

Evaluating Clustering Algorithms: A Comprehensive Guide to Metrics

Machine Learning Guide for Petroleum Professionals: Part 4

ML Day 8: Basic ML Algorithms Every IT Professional Should Know

Forecasting Stock Prices and Realized Volatility: A Hybrid Approach Using LSTM, SARIMAX, and Topological Data Analysis

What are LLMs capable of?

BxD Primer Series: Linear Regression Models

Bias in Machine Learning: The Secret Behind Model Performance

Discovering the Magic of Data: My Path to Data Science....

Assumptions?made?by?Naive?Bayes

领英推荐

The zero-frequency problem

There are three types of Naive Bayes model under the sci-kit-learn library:

What are the Pros and Cons of Naive Bayes?

Tips to improve the power of the Naive Bayes Model

Applications of Naive Bayes Algorithms

Simranjeet Singh的更多文章

Big Data: The Power of Big Data: How Large Datasets Are Driving Innovation and Improvement

Maximizing the Impact of Machine Learning with MLOps: Best Practices and Challenges

Support Vector Machines [Explained]

?[Explained] Regularization in Machine Learning

?Logistic Regression - Explained??

? Classification - Confusion Matrix Explained

??Linear Regression - Python [Scratch]

Trending Innovations in Data Analytics 2019

Machine Learning - Fun Way to Programming

社区洞察

其他会员也浏览了

Quick question from data science and machine learning interview | Part 5

Demystifying Mathematical Models

Evaluating Clustering Algorithms: A Comprehensive Guide to Metrics

Machine Learning Guide for Petroleum Professionals: Part 4

ML Day 8: Basic ML Algorithms Every IT Professional Should Know

Forecasting Stock Prices and Realized Volatility: A Hybrid Approach Using LSTM, SARIMAX, and Topological Data Analysis

What are LLMs capable of?

BxD Primer Series: Linear Regression Models

Bias in Machine Learning: The Secret Behind Model Performance

Discovering the Magic of Data: My Path to Data Science....