登录查看更多内容

Maths behind Naive Bayes

Mohit Rao

Senior Manager - Decision Science at LinkedIn

发布日期: 2021年5月20日

Ever thought of the world before computers? Statisticians were calculating the probabilities manually and predicting the classes or results! Computer programming has made our life so easy, we have pre-built modules of machine learning libraries. If you are a python developer, you can implement naive Bayes easily using scikit learn. There are tools like Orange which has drag and drop functionalities and the algorithm can be tested with one single click!.

Let's see how Naive Bayes works and the maths behind it with a simple problem statement. Let's assume Mr X is working on a notification project and he has to notify news headlines to respective subscribers. Mr X has 2 sets of subscribers, those who subscribed for political news and those who have subscribed for Entertainment news. Mr X now has to classify each and every news feed he receives and send the notifications accordingly.

Let's consider the below headlines which were already classified by Mr X

Rahul Gandhi thanks government for changing FDI norms after his warning
DMK Allies with ruling AIADMK combine in Tamil Nadu rural civic polls
Congress says government is doing injustice to retailers
Amitabh Bachchan shares throwback photo from Sholay premiere, says "How pretty Jaya looks"
IPL postponed further as Indian government extends lockdown
Siddharth Malhotra reacts to "Masakali" controversy.

Please note these are just imaginary headlines for learning purpose and may or may not be associated with any news feeds or current affairs. The first 3 headlines are clearly political and last 3 are entertainment related news feeds. Now based on this data, can we predict or classify a new news headline?

Which category would the below feed be, based on above dataset.

"Those who have no work criticise government" Says Mamata

Before jumping into solving the problem, let's understand what Naive Bayes is, it's a supervised algorithm based on Bayes theorem with strong (naive) independence assumptions between the features. Bayes’ theorem states the following relationship, given class variable y and dependent feature vector x1 through xn,

This can be decomposed as,

Let's go back to the above dataset, to classify the new news feed, we need to find the best probability comparing both categories. That is we need to calculate

P(Politics | "Those who have no work criticise government" Says Mamata) and

P(Entertainment | "Those who have no work criticise government" Says Mamata)

We can use Bayes theorem to find out the above probabilities.

Let's put the the naive assumption to the Bayes’ theorem, which is, independence among the features. So now, we split evidence into the independent parts.

Now, if any two events A and B are independent, then,

P(A|B) = P(A) * P(B), Applying this on the new news feed.

P(Those Who Have No Work Criticise government Says Mamta) = P(Those) * P(Who) * P(No) * P(Work) * P(Criticise) * P (government) * P(Says) * P(Mamta)

and,

Let's calculate the probability of word 'those' in training data labelled as politics. "Those" is not present in our training data. As we have no value for this probability. Now, we are unable to be sure about the prediction and our model performed poorly. To avoid these kind of issues, we use smoothing. In statistics, additive smoothing, also called Laplace smoothing or Lidstone smoothing, is a technique used to smooth categorical data. To ensure that our posterior probabilities are never zero, we add 1 to the numerator, and we add k to the denominator. So, in the case that we don’t have "Those" in training set, the posterior probability comes out to 1 / N + k instead of zero.

Total number of words in training set labelled as political = 31

Total number of words in training set labelled as Entertainment = 28

Total Unique words in above training set= 55

P(Those Who Have No Work Criticise government Says Mamta | politics) * P(Politics) = 1.149e-17

P(Those Who Have No Work Criticise government Says Mamta | Entertainment) * P(Entertainment) = 1.04e-17

That is, P(Politics | New feed) > P(Entertainment | New feed )

From above results it’s clear than the new sentence will be classified as Political news feed.

Please check out following tutorials if you are interested in trying out/implementing this,

Text Classification using Naive Bayes

Classification using Orange

Thank you!

要查看或添加评论，请登录

Mohit Rao的更多文章

Simple Linear Regression and the Line of best fit

2023年9月25日

Simple Linear Regression and the Line of best fit

Simple linear regression, as the name suggests, is a modeling approach that explores the connection between one…
Installation of Apache Hadoop 3.2

2022年11月10日

Installation of Apache Hadoop 3.2

1. Pre-requisites OS – Ubuntu 16.
Transformation story of a New Manager! - Article 2

2021年10月8日

Transformation story of a New Manager! - Article 2

Continuation - To read the first part of the article, click here Almost 4 months into his new role, Sam became…
Transformation story of a New Manager!

2021年9月27日

Transformation story of a New Manager!

Are you new to people management? were you an individual contributor so far? How are you feeling about the…

13 条评论
Expansion of the universe!

2021年7月12日

Expansion of the universe!

Abstract Big bang theory describes the possible expansion of the universe from an initial state, which is possibly a…

10 条评论
Are you enjoying your work? Are you Bored!?

2020年6月14日

Are you enjoying your work? Are you Bored!?

A couple of months back, I was flicking through LinkedIn and found “Transformation” as one of the core integrants in…

1 条评论
Text Mining Covid-19 Dataset

2020年3月27日

Text Mining Covid-19 Dataset

After consolidating all scientific papers and public dataset, a word cloud is created and below lines are extracted…

1 条评论
Discern the Intrinsic Motivation!!

2019年11月27日

Discern the Intrinsic Motivation!!

It appears that the performance of the task provides its own intrinsic reward…this drive… may be as basic as the…

1 条评论
Selecting Right Automation Platform.

2019年11月19日

Selecting Right Automation Platform.

With popularity of Automation especially Robotic Process Automation or RPA. Companies started leveraging automation as…

3 条评论
IT Automation Maturity Model

2019年1月17日

IT Automation Maturity Model

Artificial Intelligence, Deep learning, machine learning, Automation these are few jargon's you encounter in the…

6 条评论

See all articles

Maths behind Naive Bayes

Mohit Rao

Senior Manager - Decision Science at LinkedIn

Mohit Rao的更多文章

社区洞察

其他会员也浏览了

Interview with a Kaggle Master, GANS & Much More!

? We're late to the party so all good chemistry jokes argon

A Complete Guide to K-Nearest Neighbors (KNN)

A Guide to Building a Vanilla GAN from Scratch

Top July stories: Bayesian Machine Learning, Explained; Why Big Data is in Trouble

Top Stories, Sep 12-18: Top Algorithms; 7 Steps to Mastering Apache Spark 2.0

Top August Stories: The 10 Algorithms Machine Learning Engineers Need to Know; How to Become a Data Scientist

BxD Notes (Saturday Letter #202403)

What is Machine Learning? Part 3 - Polynomial Regression

A free roadmap for becoming a machine learning engineer in six months.

Mohit Rao的更多文章

Simple Linear Regression and the Line of best fit

Installation of Apache Hadoop 3.2

Transformation story of a New Manager! - Article 2

Transformation story of a New Manager!

Expansion of the universe!

Are you enjoying your work? Are you Bored!?

Text Mining Covid-19 Dataset

Discern the Intrinsic Motivation!!

Selecting Right Automation Platform.

IT Automation Maturity Model

社区洞察

其他会员也浏览了

Interview with a Kaggle Master, GANS & Much More!

? We're late to the party so all good chemistry jokes argon

A Complete Guide to K-Nearest Neighbors (KNN)

A Guide to Building a Vanilla GAN from Scratch

Top July stories: Bayesian Machine Learning, Explained; Why Big Data is in Trouble

Top Stories, Sep 12-18: Top Algorithms; 7 Steps to Mastering Apache Spark 2.0

Top August Stories: The 10 Algorithms Machine Learning Engineers Need to Know; How to Become a Data Scientist

BxD Notes (Saturday Letter #202403)

What is Machine Learning? Part 3 - Polynomial Regression

A free roadmap for becoming a machine learning engineer in six months.