Understanding the Central Limit Theorem

Understanding the Central Limit Theorem

Central Limit Theorem

The Central Limit theorem formally states the if we sample from a population using a sufficiently large sample size and take the mean of sample assuming the sampling is truly random, the sample means will form a normal distribution.

Understanding CLT (Central Limit Theorem) is important for Statistics, because we know that the distribution of the sample means will be normally distributed so we don't have to worry about the distribution of the population that the sample came from and we can perform any statistical test that uses the sample mean.

Why use Central Limit Theorem?

The CLT is useful when analyzing large datasets, as it assumes that a sufficiently large sample size can be used to analyze and make inferences about the whole dataset (i.e. population).

Uses Of CLT

  • For example, if a certain demographic group will like a new project or not, we can not ask the whole population it would be time consuming and expensive so, we use a sufficient sample size for analysis
  • CLT is useful in finance when analyzing a large collection of securities to estimate portfolio distributions and traits for returns, risk, and correlation.

Example

let us take an example of rolling of a dice. We have a Series of the numbers 1 to 6 called die.

import pandas as pd
import numpy as np

dice = pd.Series([1, 2, 3, 4, 5, 6])        

In order to simulate the rolling of a die 8 times we will use pd.Series.sample() and set replace=True, to sample with replacement.


# Rolling die 8 times
samples = die.sample(8, replace=True)        

Now, take the mean of the samples.


np.mean(samples)        

Now, repeat the rolling process 10 times


sample_means = []
for i in range(10):
  samples = die.sample(8, replace=True)
  sample_means.append(np.mean(samples))
print(sample_means)        

This gives us a list of 10 values (i.e. sample means)

10 Sample Means
List of Sample Means

Visualizing Distributions

In order to visualize the sampling distribution of the sampling mean let us plot a histogram.

Visual distribution of 10 Sample Means
Sampling distribution of 10 samling means

sns.histplot(sample_means)

plt.show()        
Histograms of 100 to 100000 Sampling Means
Sampling distributon of Sampling Means

fig, ax = plt.subplots(1, 4, figsize=(40,10)
fig.suptitle("Sample Distribution of Sample Means")
no_of_rolls = [100, 1000, 10000, 100000]


for i in range(len(no_of_rolls)):
? ? data = rolling_n_times(no_of_rolls[i], dice, 20, True)
? ? ax[i].hist(data, histtype='bar')
? ? ax[i].set_xlabel(f'''Means of {no_of_rolls[i]} rolls''')
plt.show()
        

Functions Used


# Taking samples and return their means
def sample_with_mean(die, rolls, replace=True)
    """

    :param die: pandas.core.series.Series
        Pandas series to take sample from
    :param rolls: int
        No of samples needed
    :param replace: bool
        Sampling with replacement (True) or Sampling Without Replacement (False)
    :return:
        :type: float
            The mean of the resultant samples/rolls

    """

    samples = die.sample(rolls, replace=replace)
    return np.mean(samples)




# repeat rolling n times
def rolling_n_times(n, die, rolls, replace=True):
    """

    :param n: int
        No of times sampling is performed
    :param die: pandas.core.series.Series
        Pandas series to take sample from
    :param rolls: int
        No of samples needed
    :param replace: bool
        Sampling with replacement (True) or Sampling Without Replacement (False)
    :return:
        :type: list
            The list of means of the resultant samples/rolls

    """

    sample_means = []
    for i in range(n):
        sample_means.append(sample_with_mean(die, rolls, replace))
    return sample_means        

References

  1. Bento, C. (2020, October 15). Central Limit Theorem: a real-life application. Towards Data Science. https://towardsdatascience.com/central-limit-theorem-a-real-life-application-f638657686e1
  2. Wayne W. LaMorte, MD, PhD, MPH. (2016, July 24). Central Limit Theorem - SPH. https://sphweb.bumc.bu.edu/otlt/mph-modules/bs/bs704_probability/BS704_Probability12.html#:~:text=The%20central%20limit%20theorem%20states,will%20be%20approximately%20normally%20distributed.
  3. Maggie Matsui, Introduction to Statistics in Python. DataCamp. https://app.datacamp.com/learn/courses/introduction-to-statistics-in-python

要查看或添加评论,请登录

Hassan Abbas的更多文章

社区洞察

其他会员也浏览了