Quick Revision: Essential Statistical Concepts
Yokeswaran S
Software Engineer @ Tata Communications | Building the AI Product | Sharing Machine Learning Fundamentals | AI Enthusiast
Statistics is the science of collecting, analyzing, and interpreting data. This guide serves as a quick revision of key concepts, focusing on definitions and essential points.
1. Statistics Basics
Statistics Types
Descriptive Statistics: Compiles data in terms of measures such as mean, median, and standard deviation.
Inferential Statistics: Makes predictions regarding a population from sample data.
Population and Sample
Population: The entire group of people or things being investigated.
Sample: A sample of the population for analysis.
Types of Sampling
Random Sampling: Each element has an equal probability of selection.
Stratified Sampling: Population is sub-divided, samples are drawn from each.
Cluster Sampling: Chooses whole groups randomly.
Systematic Sampling: Selects every nth person.
2. Data Types & Measurement Scales
Types of Data
Qualitative (Categorical): Non-numerical information (e.g., colors, gender).
Quantitative (Numerical):
Discrete: Values that can be counted (e.g., number of students).
Continuous: Values that can be measured (e.g., height, weight).
Scales of Measurement
Nominal: Orderless categories (e.g., blood types).
Ordinal: Ordered categories (e.g., survey ratings).
Interval: Numeric information with equal distances but no real zero (e.g., temperature in Celsius).
Ratio: Numeric information with real zero (e.g., age, income).
3. Descriptive Statistics
Measures of Central Tendency
Mean: Average value.
Median: Middle value in ordered dataset.
Mode: Value with the highest frequency.
Measures of Dispersion
Range: Excess of maximum over minimum.
Variance: Quantifies spread of data points from the mean.
Standard Deviation: Variance's square root, representing dispersion.
Why is Sample Variance Divided by (n-1)?
To remove bias while approximating population variance using a sample (Bessel's correction).
Percentiles & Quartiles
Percentile: Shows a location in a data set (e.g., 90th percentile).
Quartiles: Split data into four sections of equal size.
Correlation vs. Covariance
Correlation: Assesses the degree of relationship between two variables (-1 to 1).
Covariance: Shows direction rather than strength of relationship.
4. Probability Distributions
Key Distributions
Bernoulli: Binary outcomes (success/failure).
Binomial: Counts successes in fixed trials.
Poisson: Counts events in a fixed interval (e.g., calls per hour).
Normal (Gaussian): Symmetrical bell-shaped curve.
Standard Normal: Normal distribution with mean 0 and standard deviation 1.
Uniform: Equal probability for all values.
Log-Normal: Data follows a normal distribution after logarithm transformation.
Power Law: Few large values, many small values.
Pareto: 80/20 principle (small causes, large effects).
Key Probability Functions
PDF (Probability Density Function): For continuous data.
PMF (Probability Mass Function): For discrete data.
CDF (Cumulative Distribution Function): Provides cumulative probability up to a value.
Central Limit Theorem (CLT)
With a sufficiently large sample, the sampling distribution of the mean is approximately normal.
5. Hypothesis Testing
What is Hypothesis Testing?
A procedure to make inferences about a population based on sample data. It consists of:
Null Hypothesis (H?): No effect or no difference.
Alternative Hypothesis (H?): There is an effect or difference.
Test Statistic: A value to determine whether to reject H?.
P-value: Probability of getting the observed data if H? is true.
Significance Level (α): The cutoff (usually 0.05) to reject H?.
P-Value and Hypothesis Testing
P-value < α (e.g., 0.05): Reject H? (evidence supports H?).
P-value ≥ α: Don't reject H? (not enough evidence).
Z-Test and Hypothesis Testing
Applies when the population variance is known, for large samples (n > 30).
Student's T-Distribution
Probability distribution used for small sample sizes (n < 30) when the population variance is unknown.
T-Statistics and T-Test in Hypothesis Testing
T-Test: Compares means of small sample groups.
T-Statistic: Quantifies the extent to which the sample mean differs from the population mean.
Z-Test vs. T-Test
Z-Test: Applied when the sample size is large, and population variance is known.
T-Test: Applied for small samples or when population variance is unknown.
Bayes' Theorem
A theorem of probability that revises the probability of an event upon introduction of fresh evidence.
6. Confidence Intervals and Margin of Error
Confidence Interval (CI)
An interval of values that most likely include the population parameter.
Formula:
where:
Margin of Error (MoE)
The level of random sampling error in the findings.
A 95% CI implies if we conduct the experiment many times, 95% of the intervals will include the true parameter.
7. Estimates in Statistics
Point and interval estimates assist in making inferences about population parameters from sample data.
Point Estimate: A point estimate value (e.g., sample mean).
Interval Estimate: An interval of values (e.g., confidence intervals).
Conclusion
This guide provides a concise overview of essential statistical concepts, including types of statistics, sampling methods, data types, measures of central tendency and dispersion, probability distributions, hypothesis testing, and confidence intervals. Understanding these foundational principles equips individuals to effectively analyze and interpret data, facilitating informed decision-making across various fields. Embracing these concepts is vital for harnessing the power of data in research and practical applications.
Adaptive and Aspiring software Developer | Seeking Opportunities to Innovate and Grow
2 周Insightful
Aspiring Software Developer | B.E. Computer Science, Rajalakshmi Engineering College '24 | Seeking Internship & Full-Time Opportunities
2 周Very informative
SDE at Altair | Javascript , ReactJS , NodeJS | Undergrad from Rajalakshmi Engineering College
2 周Very informative !
Java Developer | Microservices | Spring Integration | API Development | Quick Learner | Problem Solver
2 周Insightful??