登录查看更多内容

Concise Basic Stats - Part IV: Central Limit Theorem and The Law of Large Numbers

Luiz Eduardo Fonseca

Data Scientist and Python Developer

发布日期: 2023年4月5日

Hello everyone, hope you are doing well and keeping up with the Concise Stats Series! I am surely excited because today we are going to explore two of the main results in all of probability theory. That is: The?Central Limit Theorem (CLI)?and the?Law of Large Numbers. But what do these results mean and can we get some basic intuition behind them? Let’s find out.

Central Limit Theorem

The first version of the Central Limit Theorem was proposed by the french mathematician Abraham de Moivre in 1733. Moivre published an article where he used a normal distribution to approximate the distribution of the number of heads resulting from many tosses of a fair coin (we will see this result more in depth when we look at the normal approximation to the Binomial). The finding was nearly forgotten until another french mathematician, called Pierre-Simon Laplace expanded it in 19th century, as part of his tremendous work,?Théorie Analytique des Probabilités.

To begin, let us draw from the formal representation of the central limit theorem:

, when X1, X2,.... are independent observations of random variable X and as n goes to infinity. X must have finite variance

Let's try to unpack the result above. On the left side of the statement we have the summation of the values of some variable over its number of elements, that is, its mean. But not any mean, the?sample mean. That is, the average of each sample. In its base form, the CLI states that if we take sufficiently large random samples from a population, then we have that the sample means will be approximately normally distributed, with the above parameters,?regardless of the distribution from which we are sampling.?The italicized text is not just for aesthetics, it is highlighted as it describes what makes this theorem so powerful. No matter the underlying distribution from which the data comes from, if we get enough samples, its sample mean distribution will be approximately?normal.

Ok, let's try to have an example in order to understand this better. We first generate a population of random numbers following an exponential distribution. (Note: that we are sampling from a non-normal distribution). Then, from this population of numbers we take a sample of size 35 and calculate its mean. We save this mean into a list. Ok, now, we keep repeating this experiment by taking another sample of size 35 and obtain its mean. We do that for 4000 samples. We will end up with 4000 values for means. Now, we plot these mean values as a histogram and?voila! Sure enough you will get something that resembles very closely a bell-shaped distribution. Remember we are always assuming that the variance is finite, for instance, even if you get millions of points from a Cauchy distribution, you won't get the same result, because the CLI will not apply to the Cauchy distribution, since it does not have finite variance. But don't worry too much, because fortunately in practice all real data is finite!

import numpy as np
import matplotlib.pyplot as plt


lambda_val = 1?
population_size = 10000?
population = np.random.exponential(scale=1/lambda_val, size=population_size)




# Sampling from the population
sample_size = 35? # sample size
num_samples = 4000? # number of samples
sample_means = []? # list to store sample means




for i in range(num_samples):
? ? sample = np.random.choice(population, size=sample_size, replace=False)
? ? sample_mean = np.mean(sample)
? ? sample_means.append(sample_mean)


# Plot histogram of sample means
plt.hist(sample_means, bins=30, density=True, alpha=0.6, color="b", label="Sample Means")
plt.xlabel('Sample Mean')
plt.ylabel('Probability Density')
plt.title('Central Limit Theorem')
plt.legend(loc='best')
plt.show()

We can look at another example, now drawing samples from an Uniform distribution:

import ipywidgets as widget
from scipy import stats
import matplotlib.pyplot as plt
import seaborn as sns




def viz(n_samples:int):
? ??
? ? # Generate a Uniform r.v. between 0 and 100.
? ? rv= stats.uniform(0,100)


? ??
? ? # Take samples of size 5, compute the mean and save in list
? ? s_size = 5
? ? value_mean_list = []
? ? for sample_index in range(n_samples):?
? ? ? ? sample_values = rv.rvs(s_size)
? ? ? ? value_mean = pd.Series(sample_values).mean()
? ? ? ? value_mean_list.append(value_mean)
? ??
? ? # Display the means in a histogram
? ? sns.displot(
? ? ? ? pd.Series(value_mean_list),
? ? ? ? kde=True,
? ? ? ? color=list(plt.rcParams['axes.prop_cycle'])[1]['color']
? ? ? ? ? ? ? ?)
? ? plt.title(f'Number of Samples of size {s_size}: {n_samples};')
? ? plt.xlabel('Sample Means')
? ? plt.xlim(0, 100)
? ? plt.ylim(0, 80)
? ??
widgets.interact(
? ? viz,
? ? n_samples = widgets.IntSlider(value=10,step=20, max=600),? ?
);

The point here is that one can use the normal distribution to make inferences about the mean of the original population, even if the distribution of that population data is unknown. That is, we have the ability to analyze data even with incomplete information. Through the CLI, we can make use of well-developed statistical inference procedures that are based on a normal distribution, even if we are sampling from a population that is not normal, provided that we have a large enough sample size (and finite variance). This is a fundamental concept in statistics that allows us to make probabilistic inferences about population parameters based on sample statistics.

Law of Large Numbers

The Law of Large Numbers is another cornerstone of probability theory. It was first proven by the Swiss mathematician, Jacob Bernoulli in 1713. Bernoulli and his contemporaries were developing a formal probability theory with a view toward analyzing games of chance.

Its result states that after repeating an experiment multiple times, the empirical result will actually?align?with the analytical, expected result. Again, let's take our toss of a fair coin as a way to illustrate. We know that, if it is a fair coin, that the probability of getting heads is 1/2. Ok. Now, if we toss the coin 10 times and record the outcome each time, do you think you will get 5 out of 0 heads? Unless the stars are aligned into working in your favor that day, probably not. Well, but what if this experiment was repeated multiple times, maybe a thousand times over. We could simulate that in a computer and we will see that our proportion of heads (aka the relative frequency) will now be much closer to the expected 1/2. Now, what if we really bring things to the extreme and run the experiment say... a?million?times?? Well, by then you will have a number of heads really really close to the expected 1/2 number of trials, illustrating the Law of Large Numbers in action.

领英推荐

Absence of evidence and evidence of absence: The…

Prof. Procyon Mukherjee 4 年前

Nature Is a Lazy Mathematician, Part 3

Mario Schlosser 1 年前

Between Logos and the Abyss

Justin Lyon 3 天前

import random


# Function to simulate rolling a fair six-sided die
def coin_toss():
? ? return random.randint(0, 1)


# Number of times to roll the die
num_tosses = 1000000


# List to store the rolling results
results = []


# Roll the die and store the results
for _ in range(num_tosses):
? ? toss = coin_toss()
? ? results.append(toss)


# Calculate the average of the rolling results
average = sum(results) / num_rolls


# Print the average
print(f"Average of {num_rolls} 'Head' coin tosses: {average}")m

Output:

Average of 1000000 'Head' coin tosses: 0.500292

As we increase the number of times we execute the experiment, the closer the?actual?relative frequency is getting to the?expected?probability, and the more stable the value will become. Here is the statement, formulated as a mathematical limit:

Let's look at another example in python using the same idea but with dice. We increase the number of flips and get the probability for getting a particular value, like for instance 4.

The law of large numbers show that as we increase the rolls, the observed probability (the relative frequency in each sample) tends to the "true" probability of getting a particular value in a dice roll, which is 1/6 = 0.166

import random

probabilities = []
number_of_flips = []
target=[4]
for n_rolls in range(1,2000):
? ? n_sucess=0
? ? n_fail=0
? ? for _ in range(n_rolls):
? ? ? ? if random.randint(1,6) in target:
? ? ? ? ? ? n_sucess+=1
? ? ? ? else:
? ? ? ? ? ? n_fail+=1
? ? probability_of_head = n_sucess/(n_sucess+n_fail)
? ? probabilities.append(probability_of_head)
? ? number_of_flips.append(n_rolls)
?
 ??
plt.subplot(2,1,1)
plt.hist(probabilities,100,label='Probality of Heads')
plt.axvline(np.array(probabilities).mean(), color='yellow')
plt.legend()

plt.subplot(2,1,2)
plt.plot(number_of_flips,probabilities)
plt.xlabel('Number of Dice Rolls')
plt.ylabel(f'Probability of {target}')
plt.grid(True)

plt.show()

The yellow line in the above histogram is the mean of the values for all the observed probabilities. See how it is pretty much equal to the value we are expecting (1/6)

import numpy as np
np.array(probabilities).mean()

# Out
0.16698811250041096

This law exists in two forms:?weak and strong. The weak formulation describes how a sequence of probabilities converges, the "convergence in probability" in guaranteed. On the other hand, the strong formulation describes how a sequence of random variables behaves in the limit, in which "almost sure" convergence is guaranteed. All this results have formal mathematical proofs associated with them, however I believe it is beyond the scope of this article, as it would require more advanced concepts.

Recap

So, we just explored two very important theorems in the realm of statistics. Let’s recap some of the things we’ve seen:

The CLT says that as the sample size tends to infinity, the distribution of mean approaches the normal distribution. This is a statement about the shape?of the distribution. A normal distribution is bell-shaped so the shape of the sample means begins to look bell-shaped as the sample size increases.

CLT requires extra assumptions on top of those needed for LLN. So you can have LLN without CLT but not the other way around.

Thanks for following all the way through this post, and I hope you had a chance to discover something new, or at least helped solidify some concepts. If you liked, please share and give it a reaction. Constructive criticism is also always appreciated.I hope Hope to see you in a next post. Godspeed!