Understanding the Central Limit Theorem
Hassan Abbas
Software Design Engineer | LangChain | FastAPI | Flask | Vue | Quarkus | AI/ML
Central Limit Theorem
The Central Limit theorem formally states the if we sample from a population using a sufficiently large sample size and take the mean of sample assuming the sampling is truly random, the sample means will form a normal distribution.
Understanding CLT (Central Limit Theorem) is important for Statistics, because we know that the distribution of the sample means will be normally distributed so we don't have to worry about the distribution of the population that the sample came from and we can perform any statistical test that uses the sample mean.
Why use Central Limit Theorem?
The CLT is useful when analyzing large datasets, as it assumes that a sufficiently large sample size can be used to analyze and make inferences about the whole dataset (i.e. population).
Uses Of CLT
Example
let us take an example of rolling of a dice. We have a Series of the numbers 1 to 6 called die.
import pandas as pd
import numpy as np
dice = pd.Series([1, 2, 3, 4, 5, 6])
In order to simulate the rolling of a die 8 times we will use pd.Series.sample() and set replace=True, to sample with replacement.
# Rolling die 8 times
samples = die.sample(8, replace=True)
Now, take the mean of the samples.
领英推荐
np.mean(samples)
Now, repeat the rolling process 10 times
sample_means = []
for i in range(10):
samples = die.sample(8, replace=True)
sample_means.append(np.mean(samples))
print(sample_means)
This gives us a list of 10 values (i.e. sample means)
Visualizing Distributions
In order to visualize the sampling distribution of the sampling mean let us plot a histogram.
sns.histplot(sample_means)
plt.show()
fig, ax = plt.subplots(1, 4, figsize=(40,10)
fig.suptitle("Sample Distribution of Sample Means")
no_of_rolls = [100, 1000, 10000, 100000]
for i in range(len(no_of_rolls)):
? ? data = rolling_n_times(no_of_rolls[i], dice, 20, True)
? ? ax[i].hist(data, histtype='bar')
? ? ax[i].set_xlabel(f'''Means of {no_of_rolls[i]} rolls''')
plt.show()
Functions Used
# Taking samples and return their means
def sample_with_mean(die, rolls, replace=True)
"""
:param die: pandas.core.series.Series
Pandas series to take sample from
:param rolls: int
No of samples needed
:param replace: bool
Sampling with replacement (True) or Sampling Without Replacement (False)
:return:
:type: float
The mean of the resultant samples/rolls
"""
samples = die.sample(rolls, replace=replace)
return np.mean(samples)
# repeat rolling n times
def rolling_n_times(n, die, rolls, replace=True):
"""
:param n: int
No of times sampling is performed
:param die: pandas.core.series.Series
Pandas series to take sample from
:param rolls: int
No of samples needed
:param replace: bool
Sampling with replacement (True) or Sampling Without Replacement (False)
:return:
:type: list
The list of means of the resultant samples/rolls
"""
sample_means = []
for i in range(n):
sample_means.append(sample_with_mean(die, rolls, replace))
return sample_means
References