登录查看更多内容

Understanding the Central Limit Theorem

Hassan Abbas

Software Design Engineer | LangChain | FastAPI | Flask | Vue | Quarkus | AI/ML

发布日期: 2022年12月25日

Central Limit Theorem

The Central Limit theorem formally states the if we sample from a population using a sufficiently large sample size and take the mean of sample assuming the sampling is truly random, the sample means will form a normal distribution.

Understanding CLT (Central Limit Theorem) is important for Statistics, because we know that the distribution of the sample means will be normally distributed so we don't have to worry about the distribution of the population that the sample came from and we can perform any statistical test that uses the sample mean.

Why use Central Limit Theorem?

The CLT is useful when analyzing large datasets, as it assumes that a sufficiently large sample size can be used to analyze and make inferences about the whole dataset (i.e. population).

Uses Of CLT

For example, if a certain demographic group will like a new project or not, we can not ask the whole population it would be time consuming and expensive so, we use a sufficient sample size for analysis
CLT is useful in finance when analyzing a large collection of securities to estimate portfolio distributions and traits for returns, risk, and correlation.

Example

let us take an example of rolling of a dice. We have a Series of the numbers 1 to 6 called die.

import pandas as pd
import numpy as np

dice = pd.Series([1, 2, 3, 4, 5, 6])

In order to simulate the rolling of a die 8 times we will use pd.Series.sample() and set replace=True, to sample with replacement.


# Rolling die 8 times
samples = die.sample(8, replace=True)

Now, take the mean of the samples.

领英推荐

The Inverse Problem in Random Dynamical Systems

Vincent Granville 3 年前

What is the difference between Big O, Big Omega, and…

Emad Yowakim 1 年前

The Devil’s in the Detail: Why Coastlines Defy…

Matthew Weaver 6 个月前


np.mean(samples)

Now, repeat the rolling process 10 times


sample_means = []
for i in range(10):
  samples = die.sample(8, replace=True)
  sample_means.append(np.mean(samples))
print(sample_means)

This gives us a list of 10 values (i.e. sample means)

Visualizing Distributions

In order to visualize the sampling distribution of the sampling mean let us plot a histogram.

Visual distribution of 10 Sample Means — Sampling distribution of 10 samling means


sns.histplot(sample_means)

plt.show()

Histograms of 100 to 100000 Sampling Means — Sampling distributon of Sampling Means


fig, ax = plt.subplots(1, 4, figsize=(40,10)
fig.suptitle("Sample Distribution of Sample Means")
no_of_rolls = [100, 1000, 10000, 100000]


for i in range(len(no_of_rolls)):
? ? data = rolling_n_times(no_of_rolls[i], dice, 20, True)
? ? ax[i].hist(data, histtype='bar')
? ? ax[i].set_xlabel(f'''Means of {no_of_rolls[i]} rolls''')
plt.show()

Functions Used


# Taking samples and return their means
def sample_with_mean(die, rolls, replace=True)
    """

    :param die: pandas.core.series.Series
        Pandas series to take sample from
    :param rolls: int
        No of samples needed
    :param replace: bool
        Sampling with replacement (True) or Sampling Without Replacement (False)
    :return:
        :type: float
            The mean of the resultant samples/rolls

    """

    samples = die.sample(rolls, replace=replace)
    return np.mean(samples)




# repeat rolling n times
def rolling_n_times(n, die, rolls, replace=True):
    """

    :param n: int
        No of times sampling is performed
    :param die: pandas.core.series.Series
        Pandas series to take sample from
    :param rolls: int
        No of samples needed
    :param replace: bool
        Sampling with replacement (True) or Sampling Without Replacement (False)
    :return:
        :type: list
            The list of means of the resultant samples/rolls

    """

    sample_means = []
    for i in range(n):
        sample_means.append(sample_with_mean(die, rolls, replace))
    return sample_means

References

Bento, C. (2020, October 15). Central Limit Theorem: a real-life application. Towards Data Science. https://towardsdatascience.com/central-limit-theorem-a-real-life-application-f638657686e1
Wayne W. LaMorte, MD, PhD, MPH. (2016, July 24). Central Limit Theorem - SPH. https://sphweb.bumc.bu.edu/otlt/mph-modules/bs/bs704_probability/BS704_Probability12.html#:~:text=The%20central%20limit%20theorem%20states,will%20be%20approximately%20normally%20distributed.
Maggie Matsui, Introduction to Statistics in Python. DataCamp. https://app.datacamp.com/learn/courses/introduction-to-statistics-in-python

要查看或添加评论，请登录

Hassan Abbas的更多文章

Introduction to Cloud Computing 101: What I Learned...

2025年3月13日

Introduction to Cloud Computing 101: What I Learned...

?? Just Completed 'Introduction to Cloud Computing 101' from AWS Educate! Here’s What I Learned… Ever wondered how…
Guided Project - Build a Web App using Streamlit to show NYC collision data

2022年4月3日

Guided Project - Build a Web App using Streamlit to show NYC collision data

Introduction: This article is about a guided hands-on project on coursera for building data science web app with the…

2 条评论

Understanding the Central Limit Theorem

Hassan Abbas

Software Design Engineer | LangChain | FastAPI | Flask | Vue | Quarkus | AI/ML

Central Limit Theorem

Why use Central Limit Theorem?

Uses Of CLT

Example

领英推荐

Visualizing Distributions

Functions Used

References

Hassan Abbas的更多文章

社区洞察

其他会员也浏览了

Atomic References with C++20

The Central Limit Theorem (CLT)

Ridge Regression

The ubiquity of Central Limit Theorem (CLT) | Regression Coefficients

Elastic Net Regularization

Why Topological Data Analysis Detects Financial Bubbles

Log Rank Test for Survival Analysis

Classification Trend-Following Assets by the Employment of Dynamic Time Warping (DTW) and Machine Learning Algorithm

Random Variable and Probability Distribution

Optimizing macro trading signals – A practical introduction

Central Limit Theorem

Why use Central Limit Theorem?

Uses Of CLT

Example

领英推荐

Visualizing Distributions

Functions Used

References

Hassan Abbas的更多文章

Introduction to Cloud Computing 101: What I Learned...

Guided Project - Build a Web App using Streamlit to show NYC collision data

社区洞察

其他会员也浏览了

Atomic References with C++20

The Central Limit Theorem (CLT)

Ridge Regression

The ubiquity of Central Limit Theorem (CLT) | Regression Coefficients

Elastic Net Regularization

Why Topological Data Analysis Detects Financial Bubbles

Log Rank Test for Survival Analysis

Classification Trend-Following Assets by the Employment of Dynamic Time Warping (DTW) and Machine Learning Algorithm

Random Variable and Probability Distribution

Optimizing macro trading signals – A practical introduction