登录查看更多内容

Maximum Likelihood for the Normal Distribution

Lorenzo Castagno

Consultant

发布日期: 2020年4月24日

+ 关注

Let’s start with the equation for the normal distribution or normal curve

No hay texto alternativo para esta imagen

It has two parameters the first parameter, the Greek character μ (mu) determines the location of the normal distribution’s mean.

a) A smaller value for μ moves the mean of the distribution to the left.

b) A larger value for μ moves the mean of the distribution to the right.

The second parameter the Greek character σ (sigma)is the standard deviation and determines the normal distribution’s width.

a) A larger value for σ makes the normal curve shorter and wider

b) A smaller value for σ makes the normal curve taller and narrower

We’re going to use the likelihood of the normal distribution to find the optimal parameters for μ the mean and σ the standard deviation,

given some data x.

Let’s start with the simplest data set of all: a single measurement.

The goal of this super simple example is to convey the basic concepts of how to find the maximum likelihood estimates for μ and σ

Here we’ve measured a Light Bulb and it weighs 32 grams.

Now just to see what happens…

We can overlay a normal distribution with μ= 28 and σ =2 onto the data

and then plug the numbers into this equation:

The likelihood of the curve with μ = 28 and σ =2, given the data is 0.03

Now we can shift the distribution a little bit to the right by setting μ = 30 and then calculate the likelihood

Again we just plug the numbers into the likelihood function:

If we decide to fix σ = 2 so that it is a given just like the data then we can plug in a whole bunch of values for μ and see which one gives the maximum likelihood

For example, if we start with the mean of the distribution over here on the left at 20 grams.

and we get a very very small likelihood equal to 0.0000000003

this case the slope equals zero when μ = 32

Now we can fix μ = 32 and treat it like a given just like the data.

And we can plug in different values for σ to find the one that gets the maximum likelihood

Note: You actually need more than one measurement to find the optimal value for σ

If we had more data then we could plot the likelihoods for different values of σ and the maximum likelihood estimate for σ would be at the peak, where a slope of the curve equals zero:

To solve for the maximum likelihood estimate for μ we treat σ like it’s a constant and then find where the slope of its likelihood function is 0.

And to solve for the maximum likelihood estimate for σ we treat μ like it’s a constant and then find where the slope of its likelihood function is 0.

The example with one measurement kept the math simple, but now I think we’re ready to dive in a little deeper

So let’s use a two sample data set to calculate the likelihood of a normal distribution

To keep track of things, let’s call the first bulb that weighs 32 grams X_1

And the second bulb that weighs 34 grams X _2

We’ve already seen how to calculate the likelihood for this curve given X_1, the Light Bulb that weighs 32 grams and we can calculate the likelihood for the curve given X_2 by plugging in 34 into this likelihood function

but what’s the likelihood of this normal curve given both X sub 1 and X sub 2

These measurements are independent (i.e. weighing X_1 did not have an effect on weighing X_2)

So we just plug in the numbers and do the math

And that gives us a really small number:

If we had a third data point then we just add it to the given side of the overall likelihood an

With n data points

Then multiply together all n individual likelihood functions.

Now that we know how to calculate the likelihood of a normal distribution when we have more than one measurement.

We just multiply together the individual likelihoods.

Let’s solve for the maximum likelihood estimates for μ and σ

Here’s the likelihood function without any value specified for μ and σ

It equals the product of the likelihood functions for the N individual measurements:

and here’s what the equation looks like:

What we need to do is take two different derivatives of this equation:

One derivative will be with respect to μ. When we treat σ like it’s a constant and we can find the maximum likelihood estimate for μ by finding where this derivative equals zero

the other derivative will be with respect to σ when we treat μ like it’s a constant

And we can find the maximum likelihood estimate for σ by finding where this derivative equals zero, before we try to take any derivatives, let’s take the log of the likelihood function:

We do this because it makes taking the derivative way way easier

In the likelihood function and the log of the likelihood function both peak at the same values for μ and σ.

Now we’re going to go, step by step, through all of the transformations that the log has on this function

First the log transforms the multiplication

into addition:

Let’s focus on this one first

Convert the multiplication into addition

Convert 1 over the square root into the exponent -1/2

in the right side, convert the exponent into multiplication:

Back to the above equation the -1/2 exponent into multiplication

Putting everything together:

Summarizing:

For the term in the middle I used the same rule, the exponent 2 was was added as a multiplication

And by following the same steps, we can transform the remaining parts of the sum:

Into:

Just to be clear about how we simplify, keep in mind that since we have n data points that means we have a term for the first data point, X sub 1 and that this represents the terms for the remaining n minus 1 data points.

Then all n of the negative log of σ’s can be combined

and the last parts of each term stay the same.

This is the log of the likelihood function after simplification, and it is what we will take the derivative of:

So, let’s move it to the top for reference:

We’ll start by taking the derivative with respect to μ

This derivative is the slope function for the log of the likelihood curve and we’ll use it to find the peak.

The first term doesn’t contain μ, so it’s derivative is 0, the second term doesn’t contain μ either, so it’s derivative is also 0.

The third term contains μ, so now we have to work, specifically, the numerator contains μ and we have to apply.

We can use the chain rule, remember the derivative is with respect to μ ( σ is a constant and, thus, the denominator doesn’t change)

We can use the same logic to the remaining terms and get

We can pull the σ squared out and add the numerators together and combining the measurements and the μ’s

Now, let’s take the derivative of the log-likelihood function with respect to σ.

This derivative is the slope function for the log of the likelihood curve, and we’ll use it to find the peak.

So, from here on out, because they peak at the same spot I’ll show you the likelihood functions instead of the log-likelihood functions

Recall

The first term doesn’t contain σ, so it’s derivative is zero, the derivative of the second term is just n over σ.

The derivative of the third term isn’t tricky but it’s easier to figure out when we rewrite 1 over σ squared:

We can use the same logic to the remaining terms and get the derivative of the log likelihood function with respect to σ:

Simplifying:

To find the maximum likelihood estimate for μ, we need to solve for where the derivative with respect to μ=0 because the slope is zero at the peak of the curve:

Likewise to find the maximum likelihood estimate for σ, we need to solve for where the derivative with respect to σ=0

Setting the derivative with respect μ to 0 and solve for μ.

We start by multiplying both sides by σ squared, that makes the σ squared go away:

Then we add n times μ to both sides,

divide both sides by n and solve:

The maximum likelihood estimate for μ is the mean of the measurements.

Now we need to set the derivative with respect to σ to 0

Now multiply both sides by σ

Add n to both sides and multiplying both sides by σ squared

Divide both sides by n

and take the square root of both sides and at long last:

We see that the maximum likelihood estimate for σ is the standard deviation of the measurements

In Summary the mean of the data is the maximum likelihood estimate for where the center of the normal distribution should go and the standard deviation of the data is the maximum likelihood estimate of how wide the normal curve should be.

References:

This channel (Subscribe)

要查看或添加评论，请登录

Lorenzo Castagno的更多文章

Random Forest: The Math of Intelligence

2020年7月31日

Random Forest: The Math of Intelligence

We’re going to talk about building and evaluating random forests. Random forests are built from decision trees.
Steps to build a?Startup? (Case for AI)

2020年1月10日

Steps to build a?Startup? (Case for AI)

Step 1 List Personal Problems Step 2 Market Research (competing products) Step 3 Buy Domain (in case of a website) Step…
What to do in Case of an Economic Downturn or Recession?

2019年11月22日

What to do in Case of an Economic Downturn or Recession?

There is a high probability of a recession affecting the global economy at the end of 2020. A recession is a decrease…
Warren Buffett's investments

2019年11月12日

Warren Buffett's investments

Warren Buffett invests in the stock market through its business holding Berkshire Hathaway INC, it's a portfolio of…
El portafolio de Warren Buffet

2019年11月10日

El portafolio de Warren Buffet

Warren Buffett invierte en bolsa a través de su holding empresarial Berkshire Hathaway INC, su cartera de acciones está…
Apple está tratando de matar la tecnología web

2019年11月8日

Apple está tratando de matar la tecnología web

La compa?ía ha hecho que sea extremadamente difícil usar tecnología basada en la web en sus plataformas. Los lenguajes…

4 条评论
Resumen Mercados Financieros: Jul 19

2019年8月7日

Resumen Mercados Financieros: Jul 19

El mes de Julio ha empezado siendo un mes tranquilo con una aparente tregua en la guerra comercial, pero conforme iba…
How to predict price stocks using Deep Learning

2019年8月5日

How to predict price stocks using Deep Learning

Investors make guesses by analyzing data reading the news, study the company history industry and trends: there are…

2 条评论

See all articles

Maximum Likelihood for the Normal Distribution

Lorenzo Castagno

Consultant

Now just to see what happens…

but what’s the likelihood of this normal curve given both X sub 1 and X sub 2

Lorenzo Castagno的更多文章

社区洞察

其他会员也浏览了

Bi modal histograms. What they want to tell us?

Histogram

Properties of Histogram

Calculate a Correlation for dummies

Cleanse Functions that Decompose Data

Adding a Totals Row to a Table

Are You Putting the Analytical Cart Before the Data Horse?

101 : choose the right chart type for your data

Adding a Totals Row to a Table

Factor Analysis or Principal component analysis?

Now just to see what happens…

but what’s the likelihood of this normal curve given both X sub 1 and X sub 2

Lorenzo Castagno的更多文章

Random Forest: The Math of Intelligence

Steps to build a?Startup? (Case for AI)

What to do in Case of an Economic Downturn or Recession?

Warren Buffett's investments

El portafolio de Warren Buffet

Apple está tratando de matar la tecnología web

Resumen Mercados Financieros: Jul 19

How to predict price stocks using Deep Learning

社区洞察

其他会员也浏览了

Bi modal histograms. What they want to tell us?

Histogram

Properties of Histogram

Calculate a Correlation for dummies

Cleanse Functions that Decompose Data

Adding a Totals Row to a Table

Are You Putting the Analytical Cart Before the Data Horse?

101 : choose the right chart type for your data

Adding a Totals Row to a Table

Factor Analysis or Principal component analysis?