Quick Revision: Essential Statistical Concepts

Quick Revision: Essential Statistical Concepts

Statistics is the science of collecting, analyzing, and interpreting data. This guide serves as a quick revision of key concepts, focusing on definitions and essential points.


1. Statistics Basics

Statistics Types

Descriptive Statistics: Compiles data in terms of measures such as mean, median, and standard deviation.

Inferential Statistics: Makes predictions regarding a population from sample data.

Population and Sample

Population: The entire group of people or things being investigated.

Sample: A sample of the population for analysis.

Types of Sampling

Random Sampling: Each element has an equal probability of selection.

Stratified Sampling: Population is sub-divided, samples are drawn from each.

Cluster Sampling: Chooses whole groups randomly.

Systematic Sampling: Selects every nth person.

2. Data Types & Measurement Scales

Types of Data

Qualitative (Categorical): Non-numerical information (e.g., colors, gender).

Quantitative (Numerical):

Discrete: Values that can be counted (e.g., number of students).

Continuous: Values that can be measured (e.g., height, weight).

Scales of Measurement

Nominal: Orderless categories (e.g., blood types).

Ordinal: Ordered categories (e.g., survey ratings).

Interval: Numeric information with equal distances but no real zero (e.g., temperature in Celsius).

Ratio: Numeric information with real zero (e.g., age, income).


3. Descriptive Statistics

Measures of Central Tendency

Mean: Average value.

Median: Middle value in ordered dataset.

Mode: Value with the highest frequency.

Measures of Dispersion

Range: Excess of maximum over minimum.

Variance: Quantifies spread of data points from the mean.

Standard Deviation: Variance's square root, representing dispersion.

Why is Sample Variance Divided by (n-1)?

To remove bias while approximating population variance using a sample (Bessel's correction).

Percentiles & Quartiles

Percentile: Shows a location in a data set (e.g., 90th percentile).

Quartiles: Split data into four sections of equal size.

Correlation vs. Covariance

Correlation: Assesses the degree of relationship between two variables (-1 to 1).

Covariance: Shows direction rather than strength of relationship.


4. Probability Distributions

Key Distributions

Bernoulli: Binary outcomes (success/failure).

Binomial: Counts successes in fixed trials.

Poisson: Counts events in a fixed interval (e.g., calls per hour).

Normal (Gaussian): Symmetrical bell-shaped curve.

Standard Normal: Normal distribution with mean 0 and standard deviation 1.

Uniform: Equal probability for all values.

Log-Normal: Data follows a normal distribution after logarithm transformation.

Power Law: Few large values, many small values.

Pareto: 80/20 principle (small causes, large effects).

Key Probability Functions

PDF (Probability Density Function): For continuous data.

PMF (Probability Mass Function): For discrete data.

CDF (Cumulative Distribution Function): Provides cumulative probability up to a value.

Central Limit Theorem (CLT)

With a sufficiently large sample, the sampling distribution of the mean is approximately normal.


5. Hypothesis Testing

What is Hypothesis Testing?

A procedure to make inferences about a population based on sample data. It consists of:

Null Hypothesis (H?): No effect or no difference.

Alternative Hypothesis (H?): There is an effect or difference.

Test Statistic: A value to determine whether to reject H?.

P-value: Probability of getting the observed data if H? is true.

Significance Level (α): The cutoff (usually 0.05) to reject H?.

P-Value and Hypothesis Testing

P-value < α (e.g., 0.05): Reject H? (evidence supports H?).

P-value ≥ α: Don't reject H? (not enough evidence).

Z-Test and Hypothesis Testing

Applies when the population variance is known, for large samples (n > 30).

Student's T-Distribution

Probability distribution used for small sample sizes (n < 30) when the population variance is unknown.

T-Statistics and T-Test in Hypothesis Testing

T-Test: Compares means of small sample groups.

T-Statistic: Quantifies the extent to which the sample mean differs from the population mean.

Z-Test vs. T-Test

Z-Test: Applied when the sample size is large, and population variance is known.

T-Test: Applied for small samples or when population variance is unknown.

Bayes' Theorem

A theorem of probability that revises the probability of an event upon introduction of fresh evidence.


6. Confidence Intervals and Margin of Error

Confidence Interval (CI)

An interval of values that most likely include the population parameter.

Formula:


where:

  • x? = sample mean
  • Z = critical value from the Z-table
  • σ = standard deviation
  • n = sample size

Margin of Error (MoE)

The level of random sampling error in the findings.


A 95% CI implies if we conduct the experiment many times, 95% of the intervals will include the true parameter.

7. Estimates in Statistics

Point and interval estimates assist in making inferences about population parameters from sample data.

Point Estimate: A point estimate value (e.g., sample mean).

Interval Estimate: An interval of values (e.g., confidence intervals).


Conclusion

This guide provides a concise overview of essential statistical concepts, including types of statistics, sampling methods, data types, measures of central tendency and dispersion, probability distributions, hypothesis testing, and confidence intervals. Understanding these foundational principles equips individuals to effectively analyze and interpret data, facilitating informed decision-making across various fields. Embracing these concepts is vital for harnessing the power of data in research and practical applications.


Vikram V

Adaptive and Aspiring software Developer | Seeking Opportunities to Innovate and Grow

2 周

Insightful

Vaseekaran K V

Aspiring Software Developer | B.E. Computer Science, Rajalakshmi Engineering College '24 | Seeking Internship & Full-Time Opportunities

2 周

Very informative

Srivathsav S

SDE at Altair | Javascript , ReactJS , NodeJS | Undergrad from Rajalakshmi Engineering College

2 周

Very informative !

Yashwant D B

Java Developer | Microservices | Spring Integration | API Development | Quick Learner | Problem Solver

2 周

Insightful??

要查看或添加评论,请登录

Yokeswaran S的更多文章

  • Understanding JSON in python

    Understanding JSON in python

    JSON (JavaScript Object Notation) is the lightweight and widely used format for storing and exchanging the data. it is…

    7 条评论
  • An In-Depth Exploration of Iterators and Generators in Python

    An In-Depth Exploration of Iterators and Generators in Python

    Iterators in Python Definition An iterator in Python is an object that allows traversal through elements of an iterable…

    8 条评论
  • Introduction to Linear transformation and application in Data science

    Introduction to Linear transformation and application in Data science

    Functions : A function is a mathematical relationship that uniquely associates element of one set (called domain) with…

    10 条评论
  • Vectors, Their Operations, and Applications in Data Science ??

    Vectors, Their Operations, and Applications in Data Science ??

    Vectors: A vectors is an ordered list of numbers. it can represent a point in space or quantify with both magnitude and…

    11 条评论
  • Why for sample variance is divided by n-1?? ??

    Why for sample variance is divided by n-1?? ??

    Unbiased Estimator ??Understanding Variance, Standard Deviation, Population, Sample, and the Importance of Dividing by…

    6 条评论
  • Confusion within the confusion matrix ????

    Confusion within the confusion matrix ????

    What is the Confusion Matrix? A confusion matrix is a table used to evaluate the performance of a classification model.…

    8 条评论
  • Outliers:

    Outliers:

    What are Outliers? ??Outliers are the data points that are significantly differ from other data points. This may arise…

    12 条评论
  • Percentile

    Percentile

    What is percentile? ?? In statistics, a percentile indicates how a particular score compares to others within the same…

    10 条评论