?Naive Bayes Algorithm - Explained??
Simranjeet Singh
Senior Data Scientist | Ex-TCS DIGITAL, EXL | GenAI and LLM Practitioner | Domain - Finance (Lending and Credit Risk) | Tech YouTuber | Medium Blogger | Making Impact Through Data-Driven Solutions | Φ2 = Φ + 1
Naive?Bayes?is?a?probabilistic?algorithm?that’s?typically?used?for?classification?problems. It uses Conditional probability, which is a measure of the probability of an event occurring given that another event has (by assumption, presumption, assertion, or evidence) occurred. It?is?simple,?intuitive,?and?yet?performs?surprisingly?well?in?many?cases. It?is based?on?Bayes’?Theorem?with?an?assumption?of?independence?among?predictors.?In?simple?terms,?a?Naive?Bayes?classifier?assumes?that?the?presence?of?a?particular?feature?in?a?class?is?unrelated?to?the?presence?of?any?other?feature.
Assumptions?made?by?Naive?Bayes
The?fundamental?Na?ve?Bayes?assumption?is?that?each?feature?makes?an:
-?Independent
-?Equal
contribution?to?the?outcome.
For?example,?a?fruit?may?be?considered?to?be?an?apple?if?it?is?red,?round,?and?about?3?inches?in?diameter.?Even?if?these?features?depend?on?each?other?or?upon?the?existence?of?the?other?features,?all?of?these?properties?independently?contribute?to?the?probability?that?this?fruit?is?an?apple?and?that?is?why?it?is?known?as?‘Naive’.
Note - Naive?Bayes?model?is?easy?to?build?and?particularly?useful?for?very?large?data?sets.?Along?with?simplicity,?Naive?Bayes?is?known?to?outperform?even?highly?sophisticated?classification?methods.
Bayes?theorem?provides?a?way?of?calculating?posterior?probability?P(c|x)?from?P(c),?P(x),?and?P(x|c).?Look?at?the?equation?below:
Above,
-?P(c|x)?is?the?posterior?probability?of?class?(c,?target)?given?predictor?(x,?attributes).
-?P(c)?is?the?prior?probability?of?class.
-?P(x|c)?is?the?likelihood?which?is?the?probability?of?predictor?given?class.
-?P(x)?is?the?prior?probability?of?predictor.
This?is?a?rather?simple?transformation,?but?it?bridges?the?gap?between?what?we?want?to?do?and?what?we?can?do.?We?can’t?get?P(C|X)?directly,?but?we?can?get?P(X|C)?and?P(C)?from?the?training?data.?Here’s?an?example:
In?this?case,?X?=(Outlook,?Temperature,?Humidity,?Windy),?and?Y=Play.?P(X|Y)?and?P(Y)?can?be?calculated:
Having?this?amount?of?parameters?in?the?model?is?impractical.?To?solve?this?problem,?a?naive?assumption?is?made.?We?pretend?all?features?are?independent.?What?does?this?mean?
Now?with?the?help?of?this?naive?assumption?(naive?because?features?are?rarely?independent),?we?can?make?classification?with?much?fewer?parameters:
This?is?a?big?deal.?We?changed?the?number?of?parameters?from?exponential?to?linear.?This?means?that?Naive?Bayes?can deal with?high-dimensional?data?well.
Another Example with Mathematics -
Problem:?Players?will?play?if?the weather?is?sunny.?Is?this?statement?is?correct?
领英推荐
We?can?solve?it?using?above?discussed?method?of?the posterior?probability.
-?P(Yes?|?Sunny)?=?P(?Sunny?|?Yes)?*?P(Yes)?/?P?(Sunny)
-?Here?we?have?P?(Sunny?|Yes)?=?3/9?=?0.33,?P(Sunny)?=?5/14?=?0.36,?P(?Yes)=?9/14?=?0.64
-?Now,?P?(Yes?|?Sunny)?=?0.33?*?0.64?/?0.36?=?0.60,?which?has?higher?probability.
Naive?Bayes?uses?a?similar?method?to?predict?the?probability?of?different?classes?based?on?various?attributes.?This?algorithm?is?mostly?used?in?text?classification?and?with?problems?having?multiple?classes.
Na?ve Bayes Classifier assumes that all the features are unrelated to each other. The presence or absence of a feature does not influence the presence or absence of any other feature.
In real-world datasets, we test a hypothesis given multiple evidence on features. So, the calculations become quite complicated. To simplify the work, the feature independence approach is used to uncouple multiple pieces of evidence and treat each as an independent one.
The zero-frequency problem
One of the disadvantages of Na?ve-Bayes is that if you have no occurrences of a class label and a certain attribute value together then the frequency-based probability estimate will be zero. And this will get a zero when all the probabilities are multiplied.
Solution - An approach to overcome this ‘zero-frequency problem’ in a Bayesian environment is to add one to the count for every attribute value-class combination when an attribute value doesn’t occur with every class value.
There are three types of Naive Bayes model under the sci-kit-learn library:
Gaussian:?It is used in classification and it assumes that features follow a normal distribution.
Multinomial:?It is used for discrete counts. For example, let’s say, we have a text classification problem. Here we can consider Bernoulli trials which is one step further and instead of “word occurring in the document”, we have “count how often word occurs in the document”, you can think of it as “number of times outcome number x_i is observed over the n trials”.
Bernoulli:?The binomial model is useful if your feature vectors are binary (i.e. zeros and ones). One application would be text classification with a ‘bag of words’ model where the 1s & 0s are “word occurs in the document” and “word does not occur in the document” respectively.
What are the Pros and Cons of Naive Bayes?
Pros:
Cons:
Tips to improve the power of the Naive Bayes Model
Here are some tips for improving the power of the Naive Bayes Model:
Applications of Naive Bayes Algorithms
Real-time Prediction:?Naive Bayes is an eager learning classifier and it is sure fast. Thus, it could be used for making predictions in real-time.
Multi-class Prediction:?This algorithm is also well known for its multi-class prediction feature. Here we can predict the probability of multiple classes of target variables.
Text classification/ Spam Filtering/ Sentiment Analysis:?Naive Bayes classifiers mostly used in text classification (due to better results in multi-class problems and independence rule) have a higher success rate as compared to other algorithms. As a result, it is widely used in Spam filtering (identify spam e-mail) and Sentiment Analysis (in social media analysis, to identify positive and negative customer sentiments)
Recommendation System:?Naive Bayes Classifier and Collaborative Filtering together build a Recommendation System that uses machine learning and data mining techniques to filter unseen information and predict whether a user would like a given resource or not.
Thanks for Reading, Like Comment and Sharing if it's good.
Director (IT)/ Scientist 'E' at National Informatics Centre, MeitY
3 年A good and neatly explained article . thanks
Engineering Manager, Fertilizer Division
3 年Very clear explanation. Thanks for sharing