登录查看更多内容

Must Know Mathematical Measures For Every Data Scientist

Sadhiq Nazar

Digital Marketing & Analytics Consultant |Helping Start UP to Scale UP |CEO @MindGee Technologies |Mindzee Digital Academy|Ex Founder & CEO Zinavo Pvt Ltd|Trained 1000+ Folks|Consulted 5000+ Brands Globally|

发布日期: 2019年3月12日

There are a large number of mathematical measures that every data scientist needs to be aware of. This article outlines the must-know statistical measures in a concise and succinct manner.

Mean

Sum all values.
Divide it by the total number of observations.

Mode

Take the most occurring value in the sample.

Median

Sort the numbers in ascending order.
Take the middle value.

Variance

Calculate mean.
Take difference between each value and the mean
Square this difference.
Sum all differences
Finally, divide by the total number of observations.

Variance gives us dispersion of the values around the mean.

Standard Deviation

Square root of variance.

Standard deviation gives us dispersion of the values around the mean in the same units as the values (instead of squared value as variance)

Covariance

Covariance is used to find relationship between two variables. For each variable:

Calculate mean.
Take difference between each value and the mean of a variable. Multiple the difference of the two variables.
Sum all the multiplied differences.
Divide by total number of observations

Correlation

Measures strength of the relationship between variables co-movement. It is the standardized variance of two assets.

Correlation is always between -1 and 1. -1 indicates that the variables are negatively correlated and +1 shows that the variables are positively correlated. 0 indicates that there is no correlation amongst the target variables.

Calculate covariance of two variables
Calculate standard deviation of the two variables
Multiply the two standard deviations
Divide covariance by the multiplied standard deviations

Explained Sum Of Squares

For a variable Y:

Calculate difference between estimated value of Y and mean of Y
Square the difference
Sum all of the values

Sum Of Squared Residuals

For a variable Y:

Calculate difference between estimated value of Y and actual value of Y
Square the difference
Sum all of the values

Residuals are also known as errors or unknowns.

Total Sum of Squares

Explained Sum Of Squares + Sum Of Squared Residuals. Therefore it is known as total sum of all squares.

R-Squared

Measures explained variation over total variation. Additionally, R squared is also known as coefficient of determination and it measures quality of fit.

Formula to calculate R squared is:

R squared = 1 — (Sum of Squared Residuals/Total Sum of Squares)

Adjusted R-Squared

R squared by itself is not good enough as it does not consider the number of variables that gave us the degree of determination. As a result, adjusted R squared is calculated.

1 — [ [(n-1)/(n-k-1] x [1 — R squared]]

n = number of observations
k = number of independent variables

It is adjusted for the number of predictors in the model.

Standard Error Of Regression

Measures variability of actual and estimated values of Y. It is the standard deviation of the residuals. It is calculated as

[Standard deviation of sample/SquareRoot(Number of Observations)]

Mean Absolute Error

Calculate absolute differences between prediction and actual observation
Sum the absolute differences
Divide sum by total number of observations

Root Mean Squared Error

Calculate difference between prediction and actual observation.
Square the difference
Sum the squared differences
Divide sum of squared differences by total number of observations
Calculate square root of it

F1

Used to measure performance of classification based supervised machine learning algorithms. It is a weighted average of the precision and recall of a model. The results are between 1 and 0. Results tending to 1 are considered the best whereas those tending towards 0 are treated as the worst. F1 is used in classification tests where true negatives do not matter as much.

Confusion Matrix

Confusion matrix is a result table that summarises results of classification algorithm when actual true values are known.

There are several terms used:

True Positive: When the actual result is true and predicted value is also true
True Negative: When the actual result is false and predicted value is also false
False Positive: When the actual result is false but the predicted value is true
False Negative: When the actual result is true but the predicted value is false

Euclidean Distance

Finds similarity between two variables

For each variable, find difference between each value
Square the difference
Sum the differences
Square root the sum

Manhattan Distance

Finds similarity between two variables

For each variable, find difference between each value
Take the absolute of the difference
Sum the differences

Minkowski Distance

Metric form of Euclidean and Manhattan distances.

Given Minkowsky power (a number) known as λ

For each variable, find difference between each value
Take the absolute of the difference
Raise the difference to the power λ
Sum the differences
λ Root the sum

Cosine Similarity

Finds how similar two variables X and Y are:

Multiply each value of X and Y
Sum the multiplied values
Square values of each variable
Multiply the squared values of each variable together
Divide the sum of multiplied values by multiplied square values of each variable

要查看或添加评论，请登录

Sadhiq Nazar的更多文章

How to Choose The Right Social Media Platform to Reach Your Target Audience

2023年1月12日

How to Choose The Right Social Media Platform to Reach Your Target Audience

INTRODUCTION Nowadays Social Media are booming, and they are the primary hub of modern life. They give us news…
Hiring WordPress Developer

2022年12月2日

Hiring WordPress Developer

Hello WordPress Developer! Here's your chance to join Team #MindGeetechnologies We are looking for a qualified…

1 条评论
Importance of digital marketing & its scope in 2022

2022年12月1日

Importance of digital marketing & its scope in 2022

INTRODUCTION Everything has turned to be digital nowadays. Be it a common man, student, or working professional all of…

1 条评论
Job opportunity with Infosys(C2H) for Bangalore Location for the post of Salesforce With QA(python,Selenium,RPA)

2019年10月15日

Job opportunity with Infosys(C2H) for Bangalore Location for the post of Salesforce With QA(python,Selenium,RPA)

Hi,This is Bharvi from Silverlink Technologies. Silver link Technologies is a global consulting and technology services…
Job opportunity with Oracle Bangalore for Business Analyst/Data Analyst

2019年7月11日

Job opportunity with Oracle Bangalore for Business Analyst/Data Analyst

Oracle India is hiring for a Business Analyst/Data Analyst to support its operations for CoE. Please go through the…

5 条评论
New Position for Sr. Data Scientist in Chennai Locations

2019年7月10日

New Position for Sr. Data Scientist in Chennai Locations

My name is Priya. I am a Search Consultant, representing a Singapore headquartered Executive Search and Consulting…
Job | Opening for Sr. Big Data Developer _Bangalore and Hyderabad Location

2019年7月10日

Job | Opening for Sr. Big Data Developer _Bangalore and Hyderabad Location

Hi, Greetings from CareerNet Technologies Pvt Ltd!! Currently we are hiring for Sr. Big Data Developer .
Job Opening for MEAN/MERN Stack Developer : Angular or Recat & NodeJS

2019年7月4日

Job Opening for MEAN/MERN Stack Developer : Angular or Recat & NodeJS

Industry: IT/Computers - Software Functional Area: IT Sr. Node JS Developer / MEAN or MERN Stack Developer Experience:…
Job | Data Scientist-5+ years -Riyadh,Saudi Arabia

2019年7月3日

Job | Data Scientist-5+ years -Riyadh,Saudi Arabia

Looking for Data Scientist, interested can share updated profile along with below required details on…

See all articles

Must Know Mathematical Measures For Every Data Scientist

Sadhiq Nazar

Digital Marketing & Analytics Consultant |Helping Start UP to Scale UP |CEO @MindGee Technologies |Mindzee Digital Academy|Ex Founder & CEO Zinavo Pvt Ltd|Trained 1000+ Folks|Consulted 5000+ Brands Globally|

Mean

Mode

Median

Variance

Standard Deviation

Covariance

Correlation

Explained Sum Of Squares

Sum Of Squared Residuals

Total Sum of Squares

R-Squared

Adjusted R-Squared

Standard Error Of Regression

Mean Absolute Error

Root Mean Squared Error

F1

Confusion Matrix

Euclidean Distance

Manhattan Distance

Minkowski Distance

Cosine Similarity

Sadhiq Nazar的更多文章

社区洞察

其他会员也浏览了

The Data Scientist's Prayer: Finding Humour and Insight in the World of Data

Data Exploration and Data Analysis: Unveiling Insights from Raw Data

Mastering the Top 10 Statistical Concepts: The Key to Success in Data Science

Wannabe Data Scientist? Let's start from this #statistics topics.

Key Statistical Concepts Every Data Analyst Should Know

Data Demystified - Chapter 1: DIKW model

Missing Data: Causes, Types, and Handling Techniques

Understanding p-Values and Statistical Significance in Data Science

Statistics and Probability for Data Science

Mean

Mode

Median

Variance

Standard Deviation

Covariance

Correlation

Explained Sum Of Squares

Sum Of Squared Residuals

Total Sum of Squares

R-Squared

Adjusted R-Squared

Standard Error Of Regression

Mean Absolute Error

Root Mean Squared Error

F1

Confusion Matrix

Euclidean Distance

Manhattan Distance

Minkowski Distance

Cosine Similarity

Sadhiq Nazar的更多文章

How to Choose The Right Social Media Platform to Reach Your Target Audience

Hiring WordPress Developer

Importance of digital marketing & its scope in 2022

Job opportunity with Infosys(C2H) for Bangalore Location for the post of Salesforce With QA(python,Selenium,RPA)

Job opportunity with Oracle Bangalore for Business Analyst/Data Analyst

New Position for Sr. Data Scientist in Chennai Locations

Job | Opening for Sr. Big Data Developer _Bangalore and Hyderabad Location

Job Opening for MEAN/MERN Stack Developer : Angular or Recat & NodeJS

Job | Data Scientist-5+ years -Riyadh,Saudi Arabia

社区洞察

其他会员也浏览了

The Data Scientist's Prayer: Finding Humour and Insight in the World of Data

Data Exploration and Data Analysis: Unveiling Insights from Raw Data

Mastering the Top 10 Statistical Concepts: The Key to Success in Data Science

Wannabe Data Scientist? Let's start from this #statistics topics.

Key Statistical Concepts Every Data Analyst Should Know

Data Demystified - Chapter 1: DIKW model

Missing Data: Causes, Types, and Handling Techniques

Understanding p-Values and Statistical Significance in Data Science

Statistics and Probability for Data Science