登录查看更多内容

Statistics for Data Science — Basic Statistics

Md. Sawrab

发布日期: 2024年8月28日

Statistics is a foundational component of data science, providing powerful tools and techniques for analyzing and interpreting data. Data scientists rely on statistical techniques to extract meaningful insights from large and complex data sets and identify patterns and trends that can contribute to informed business decisions. With solid statistical understanding, a data scientist can better understand the behavior of the data.

In this newsletter series, we will cover everything from foundational theories to advanced analytical techniques and explore their real-world application. This series helps you to build a strong statistical understanding for data science.

What is Statistics?

Statistics is the branch of applied mathematics that deals with collection, Organization, Analysis, Interpretation, Presentation of data.

Example:

Average(mean) marks of students in an exam.
Estimating the average height of all students in a school based on a sample of 100 students.

Some Key Definition:

Data: Data can be anything and everything . Any information or facts considered as data. Example: age, weight etc.

Population: Population is the collection of all items or individuals of interested to our study. Example: All students in a class.

Types of populations: The population can be classified according to the number of individuals that make it up:

Finite population: A finite population can be counted, and the members can be studied more easily. For example, the number of people enrolled in a course.
Infinite population: They are huge populations where it is tough to count its members, so only a portion of it is usually taken into account when conducting a study, thus selecting a sample. For example, the number of grains of sand on a beach.

Sample: A sample is a subset of population used to draw conclusions about the population. Example: Some students in a class.

Parameter: Parameters are numbers that describe the properties of entire populations

Statistic: Statistic are numbers that describe the properties of entire sample.

Variable: In statistics variables are numbers or characteristics that can be counted or measured.

Example: age, length, height etc. that can be change or vary.

Types of Variable: According to weather a variable takes numerical of non-numerical values .It can be classified into two categories:

Qualitative Variable
Quantitative Variable

Qualitative Variable: Qualitative variables, also known as categorical variables, describe qualities or characteristics.

Example: Color of a car , Gender of a patient, Size of an industry etc.

Quantitative variable: Quantitative variables, also known as numerical variables, represent quantities or amounts.

Example: Number of children a family, Weight of a man etc.

Scale of Measurement: There are four types of scale as follows:

Nominal Scale
Ordinal Scale
Interval Scale
Ratio Scale

Nominal Scale: The nominal scale is the simplest form of measurement. It involves classify and identify a qualitative variable according to different categories of group .

Examples:

Gender: Male, Female
Blood Type: A, B, AB, O
Marital Status: Single, Married, Divorced

领英推荐

Statistics for Data Science: Your Gateway to Unlocking…

Handson School Of Data Science Management & Technology 1 年前

TYPES OF VALUES IN STATISTICS

Cyclobold Tech 4 个月前

Data science vs. statistics? (convo w/Perplexity)

Lars Warren Ericson 1 个月前

Ordinal Scale: The ordinal scale is a type of measurement where data is organized into a specific order or ranking. However, while you can tell which item is higher or lower in the order, the exact difference between the ranks isn’t consistent or precisely measurable.

Examples:

Education Level: High School, Bachelor’s Degree, Master’s Degree, PhD
Customer Satisfaction: Very Unsatisfied, Unsatisfied, Neutral, Satisfied, Very Satisfied
Economic Status: Low, Middle, High

Interval Scale: The interval scale not only allows for ordering of data but also provides meaningful and equal intervals between data points.

Examples:

Temperature: Celsius, Fahrenheit
Calendar Years: 2000, 2020, 2024
IQ Scores

Interval data allows for addition and subtraction, but since there is no absolute zero, multiplication and division do not apply. For instance, 20°C is not “twice as warm” as 10°C.

Ratio Scale: The ratio scale is the most informative and robust scale of measurement. It has all the properties of the interval scale, but it also includes an absolute zero point, which allows for the calculation of ratios.

Examples:

Height: 150 cm, 180 cm
Weight: 50 kg, 100 kg
Age: 20 years, 40 years

Types of statistics: There are two types of Statistics as follows:

Descriptive Statistics.
Inferential Statistics.

Descriptive Statistics: It is a method of describing and summarizing data in a meaningful way. They provide a way to present data in a meaningful and manageable form, helping you understand what the data shows at a glance.

Key Components of Descriptive Statistics:

Measures of Central Tendency: These are the values that represent the center or typical value of the data set.

Mean (Average): The sum of all data points divided by the number of points.
Median: The middle value in a data set when it’s ordered from least to greatest.
Mode: The most frequently occurring value in the data set.

Measures of Dispersion (Variability): These metrics show how spread out the data is.

Range: The difference between the highest and lowest values.
Variance: A measure of how much the data points differ from the mean.
Standard Deviation: The square root of the variance, showing how much data typically deviates from the mean.
Frequency Distribution: This shows how often each value occurs in the data set. It can be represented through tables, histograms, or pie charts.

Inferential Statistics: It is a method of draw conclusions and making predictions about a population based on a sample of data.

Key Components of Inferential Statistics:

Hypothesis Testing: This involves making an assumption (the hypothesis) about a population parameter and then using sample data to test whether this assumption is likely true or false.

Null Hypothesis (H0): The hypothesis that there is no effect or difference.

Alternative Hypothesis (H1): The hypothesis that there is an effect or difference.

Confidence Intervals: These are ranges of values that are used to estimate a population parameter. For example, a 95% confidence interval means that you can be 95% certain that the true population parameter lies within this range.
Regression Analysis: This technique assesses the relationship between variables, allowing you to predict the value of one variable based on the value of another.
t-tests, chi-square tests, ANOVA (Analysis of Variance): These are different types of statistical tests used to compare groups and see if the differences between them are statistically significant.

Thanks for reading .

Your Network is your Networth” — Tim Sanders

Connect on LinkedIn : https://www.dhirubhai.net/in/md-sawrab/

Github: https://github.com/md-sawrab

Data_Tales

319 位关注者

Rone Irfan

Data Scientist | Bridging the Gap Between Data & Business Strategy | Experienced in Python, R, & SQL

6 个月

Very informative

1 次回应

查看更多评论

要查看或添加评论，请登录

Md. Sawrab的更多文章

Statistics for Data Science — Measures of DISPERSION

2024年9月12日

Statistics for Data Science — Measures of DISPERSION

Welcome to the new series — Statistics For Data Science. In this article, we will dive into the fundamental concepts of…
Statistics for Data Science?-?Measure of Central?Tendency

2024年9月4日

Statistics for Data Science?-?Measure of Central?Tendency

Welcome to the new series?—? Statistics For Data Science. In this second article, you will learn the fundamental…

2 条评论

Statistics for Data Science — Basic Statistics

Md. Sawrab

领英推荐

Data_Tales

319 位关注者

Md. Sawrab的更多文章

社区洞察

其他会员也浏览了

Online MSc in Data Science and Business Analysis from MAHE

Introduction to Statistical Analysis

The Importance of Statistics in Data Science

Top Statistics Data Science Interview Questions to Crack The Code

Top 10 Statistics Resources to Learn Data Science in 2020

Why value creation with statistics often fails and how to avoid it.

How Much Mathematics Is Required for Your Data Analyst Career?

Descriptive Statistics in Data Science

Statistics and Probability for Data Science

How much Statistics knowledge is required to excel in Data Science?

领英推荐

Data_Tales

319 位关注者

Md. Sawrab的更多文章

Statistics for Data Science — Measures of DISPERSION

Statistics for Data Science?-?Measure of Central?Tendency

社区洞察

其他会员也浏览了

Online MSc in Data Science and Business Analysis from MAHE

Introduction to Statistical Analysis

The Importance of Statistics in Data Science

Top Statistics Data Science Interview Questions to Crack The Code

Top 10 Statistics Resources to Learn Data Science in 2020

Why value creation with statistics often fails and how to avoid it.

How Much Mathematics Is Required for Your Data Analyst Career?

Descriptive Statistics in Data Science

Statistics and Probability for Data Science

How much Statistics knowledge is required to excel in Data Science?