Maximum Likelihood for the Normal Distribution
Image from https://statistics-analytics.uark.edu/

Maximum Likelihood for the Normal Distribution

Let’s start with the equation for the normal distribution or normal curve

No hay texto alternativo para esta imagen


It has two parameters the first parameter, the Greek character μ (mu) determines the location of the normal distribution’s mean.

a) A smaller value for μ moves the mean of the distribution to the left.

b) A larger value for μ moves the mean of the distribution to the right.

No hay texto alternativo para esta imagen

The second parameter the Greek character σ (sigma)is the standard deviation and determines the normal distribution’s width.

a) A larger value for σ makes the normal curve shorter and wider

b) A smaller value for σ makes the normal curve taller and narrower

We’re going to use the likelihood of the normal distribution to find the optimal parameters for μ the mean and σ the standard deviation,

No hay texto alternativo para esta imagen

given some data x.

No hay texto alternativo para esta imagen

Let’s start with the simplest data set of all: a single measurement.

The goal of this super simple example is to convey the basic concepts of how to find the maximum likelihood estimates for μ and σ

Here we’ve measured a Light Bulb and it weighs 32 grams.

Now just to see what happens…

We can overlay a normal distribution with μ= 28 and σ =2 onto the data

No hay texto alternativo para esta imagen
No hay texto alternativo para esta imagen

and then plug the numbers into this equation:

No hay texto alternativo para esta imagen

The likelihood of the curve with μ = 28 and σ =2, given the data is 0.03

No hay texto alternativo para esta imagen

Now we can shift the distribution a little bit to the right by setting μ = 30 and then calculate the likelihood

Again we just plug the numbers into the likelihood function:

No hay texto alternativo para esta imagen
No hay texto alternativo para esta imagen

If we decide to fix σ = 2 so that it is a given just like the data then we can plug in a whole bunch of values for μ and see which one gives the maximum likelihood

No hay texto alternativo para esta imagen

For example, if we start with the mean of the distribution over here on the left at 20 grams.

No hay texto alternativo para esta imagen
No hay texto alternativo para esta imagen

and we get a very very small likelihood equal to 0.0000000003

No hay texto alternativo para esta imagen

this case the slope equals zero when μ = 32

No hay texto alternativo para esta imagen


Now we can fix μ = 32 and treat it like a given just like the data.

And we can plug in different values for σ to find the one that gets the maximum likelihood

Note: You actually need more than one measurement to find the optimal value for σ

No hay texto alternativo para esta imagen

If we had more data then we could plot the likelihoods for different values of σ and the maximum likelihood estimate for σ would be at the peak, where a slope of the curve equals zero:

No hay texto alternativo para esta imagen

To solve for the maximum likelihood estimate for μ we treat σ like it’s a constant and then find where the slope of its likelihood function is 0.

And to solve for the maximum likelihood estimate for σ we treat μ like it’s a constant and then find where the slope of its likelihood function is 0.

No hay texto alternativo para esta imagen


The example with one measurement kept the math simple, but now I think we’re ready to dive in a little deeper

So let’s use a two sample data set to calculate the likelihood of a normal distribution

To keep track of things, let’s call the first bulb that weighs 32 grams X_1

And the second bulb that weighs 34 grams X _2

No hay texto alternativo para esta imagen


No hay texto alternativo para esta imagen

We’ve already seen how to calculate the likelihood for this curve given X_1, the Light Bulb that weighs 32 grams and we can calculate the likelihood for the curve given X_2 by plugging in 34 into this likelihood function

but what’s the likelihood of this normal curve given both X sub 1 and X sub 2

These measurements are independent (i.e. weighing X_1 did not have an effect on weighing X_2)

No hay texto alternativo para esta imagen

So we just plug in the numbers and do the math

No hay texto alternativo para esta imagen

And that gives us a really small number:

No hay texto alternativo para esta imagen

If we had a third data point then we just add it to the given side of the overall likelihood an

No hay texto alternativo para esta imagen

With n data points

No hay texto alternativo para esta imagen

Then multiply together all n individual likelihood functions.

No hay texto alternativo para esta imagen

Now that we know how to calculate the likelihood of a normal distribution when we have more than one measurement.

We just multiply together the individual likelihoods.

Let’s solve for the maximum likelihood estimates for μ and σ

Here’s the likelihood function without any value specified for μ and σ

No hay texto alternativo para esta imagen

It equals the product of the likelihood functions for the N individual measurements:

No hay texto alternativo para esta imagen

and here’s what the equation looks like:

No hay texto alternativo para esta imagen

What we need to do is take two different derivatives of this equation:

One derivative will be with respect to μ. When we treat σ like it’s a constant and we can find the maximum likelihood estimate for μ by finding where this derivative equals zero

No hay texto alternativo para esta imagen

the other derivative will be with respect to σ when we treat μ like it’s a constant

No hay texto alternativo para esta imagen

And we can find the maximum likelihood estimate for σ by finding where this derivative equals zero, before we try to take any derivatives, let’s take the log of the likelihood function:

No hay texto alternativo para esta imagen

We do this because it makes taking the derivative way way easier

In the likelihood function and the log of the likelihood function both peak at the same values for μ and σ.

No hay texto alternativo para esta imagen


Now we’re going to go, step by step, through all of the transformations that the log has on this function

No hay texto alternativo para esta imagen

First the log transforms the multiplication

into addition:

No hay texto alternativo para esta imagen

Let’s focus on this one first

No hay texto alternativo para esta imagen

Convert the multiplication into addition

No hay texto alternativo para esta imagen

Convert 1 over the square root into the exponent -1/2

No hay texto alternativo para esta imagen

in the right side, convert the exponent into multiplication:

No hay texto alternativo para esta imagen

Back to the above equation the -1/2 exponent into multiplication

No hay texto alternativo para esta imagen

Putting everything together:

No hay texto alternativo para esta imagen

Summarizing:

No hay texto alternativo para esta imagen


For the term in the middle I used the same rule, the exponent 2 was was added as a multiplication

And by following the same steps, we can transform the remaining parts of the sum:

No hay texto alternativo para esta imagen

Into:

No hay texto alternativo para esta imagen


Just to be clear about how we simplify, keep in mind that since we have n data points that means we have a term for the first data point, X sub 1 and that this represents the terms for the remaining n minus 1 data points.

No hay texto alternativo para esta imagen
No hay texto alternativo para esta imagen

Then all n of the negative log of σ’s can be combined

No hay texto alternativo para esta imagen

and the last parts of each term stay the same.

This is the log of the likelihood function after simplification, and it is what we will take the derivative of:

No hay texto alternativo para esta imagen

So, let’s move it to the top for reference:

No hay texto alternativo para esta imagen

We’ll start by taking the derivative with respect to μ

No hay texto alternativo para esta imagen

This derivative is the slope function for the log of the likelihood curve and we’ll use it to find the peak.

The first term doesn’t contain μ, so it’s derivative is 0, the second term doesn’t contain μ either, so it’s derivative is also 0.

No hay texto alternativo para esta imagen

The third term contains μ, so now we have to work, specifically, the numerator contains μ and we have to apply.

We can use the chain rule, remember the derivative is with respect to μ ( σ is a constant and, thus, the denominator doesn’t change)

No hay texto alternativo para esta imagen

We can use the same logic to the remaining terms and get

No hay texto alternativo para esta imagen

We can pull the σ squared out and add the numerators together and combining the measurements and the μ’s

No hay texto alternativo para esta imagen

Now, let’s take the derivative of the log-likelihood function with respect to σ.

No hay texto alternativo para esta imagen

This derivative is the slope function for the log of the likelihood curve, and we’ll use it to find the peak.

No hay texto alternativo para esta imagen

So, from here on out, because they peak at the same spot I’ll show you the likelihood functions instead of the log-likelihood functions

No hay texto alternativo para esta imagen

Recall

No hay texto alternativo para esta imagen

The first term doesn’t contain σ, so it’s derivative is zero, the derivative of the second term is just n over σ.

The derivative of the third term isn’t tricky but it’s easier to figure out when we rewrite 1 over σ squared:

No hay texto alternativo para esta imagen

We can use the same logic to the remaining terms and get the derivative of the log likelihood function with respect to σ:

No hay texto alternativo para esta imagen

Simplifying:

No hay texto alternativo para esta imagen

To find the maximum likelihood estimate for μ, we need to solve for where the derivative with respect to μ=0 because the slope is zero at the peak of the curve:

No hay texto alternativo para esta imagen

Likewise to find the maximum likelihood estimate for σ, we need to solve for where the derivative with respect to σ=0

No hay texto alternativo para esta imagen

Setting the derivative with respect μ to 0 and solve for μ.

No hay texto alternativo para esta imagen


We start by multiplying both sides by σ squared, that makes the σ squared go away:

No hay texto alternativo para esta imagen

Then we add n times μ to both sides,

No hay texto alternativo para esta imagen

divide both sides by n and solve:

No hay texto alternativo para esta imagen

The maximum likelihood estimate for μ is the mean of the measurements.

No hay texto alternativo para esta imagen

Now we need to set the derivative with respect to σ to 0

No hay texto alternativo para esta imagen

Now multiply both sides by σ

No hay texto alternativo para esta imagen

Add n to both sides and multiplying both sides by σ squared

No hay texto alternativo para esta imagen

Divide both sides by n

No hay texto alternativo para esta imagen


and take the square root of both sides and at long last:

No hay texto alternativo para esta imagen

We see that the maximum likelihood estimate for σ is the standard deviation of the measurements

No hay texto alternativo para esta imagen


In Summary the mean of the data is the maximum likelihood estimate for where the center of the normal distribution should go and the standard deviation of the data is the maximum likelihood estimate of how wide the normal curve should be.

References:

This channel (Subscribe)

要查看或添加评论,请登录

Lorenzo Castagno的更多文章

  • Random Forest: The Math of Intelligence

    Random Forest: The Math of Intelligence

    We’re going to talk about building and evaluating random forests. Random forests are built from decision trees.

  • Steps to build a?Startup? (Case for AI)

    Steps to build a?Startup? (Case for AI)

    Step 1 List Personal Problems Step 2 Market Research (competing products) Step 3 Buy Domain (in case of a website) Step…

  • What to do in Case of an Economic Downturn or Recession?

    What to do in Case of an Economic Downturn or Recession?

    There is a high probability of a recession affecting the global economy at the end of 2020. A recession is a decrease…

  • Warren Buffett's investments

    Warren Buffett's investments

    Warren Buffett invests in the stock market through its business holding Berkshire Hathaway INC, it's a portfolio of…

  • El portafolio de Warren Buffet

    El portafolio de Warren Buffet

    Warren Buffett invierte en bolsa a través de su holding empresarial Berkshire Hathaway INC, su cartera de acciones está…

  • Apple está tratando de matar la tecnología web

    Apple está tratando de matar la tecnología web

    La compa?ía ha hecho que sea extremadamente difícil usar tecnología basada en la web en sus plataformas. Los lenguajes…

    4 条评论
  • Resumen Mercados Financieros: Jul 19

    Resumen Mercados Financieros: Jul 19

    El mes de Julio ha empezado siendo un mes tranquilo con una aparente tregua en la guerra comercial, pero conforme iba…

  • How to predict price stocks using Deep Learning

    How to predict price stocks using Deep Learning

    Investors make guesses by analyzing data reading the news, study the company history industry and trends: there are…

    2 条评论

社区洞察

其他会员也浏览了