登录查看更多内容

Probability and Likelihood in R

Tobi Aminu

Data Scientist | Computational Biologist | Educator

发布日期: 2023年10月14日

+ 关注

Probability

What is Probability vs Probability distribution.

Probability is a measure of the likelihood of an event occurring.

Probability distribution is a mathematical function or a model that measures the probability of occurrence of different possible values in an experiment.

Flip a coin

Consider a coin toss example (experiment). If we toss a fair coin and the probability of tossing a head or tail is 50%. p= 0.5.

The probability mass function (PMF) of flipping a coin, the random variable (x) can either be 1 (head) or 0 (tail). The number of coin toss is (n=1) and the theoretical probability theta = 0.5

Using the binomial probability distribution (another type of probability function) we can find out the probability of different possible occurrence e.g tossing 2 heads out of the 8 tosses.

PMF

#dbinom generate probability mass function

#x = targeted random variable 
#N = Number of attempts (no of coin toss) 1
#θ = Theoretical probability of success (getting a head)
#0 = tail, 
#1= head
dbinom(1,size = 1, prob =0.5) #P(x=1|N=1)

[1] 0.5

dbinom(0,size=1, prob = 0.5) #P(x=0|N=1)

[1] 0.5

Using the binomial probability distribution (another type of probability function) we can find out the probability of different possible occurrence e.g tossing 2 heads out of the 8 tosses.

The random variable x can take in multiple values e.g 0-8

set.seed(123)

# Number of coin tosses
n <- 8

# Probability of getting a head on a single toss
p <- 0.5

# Generate a sequence of possible outcomes (number of heads) or no of random variables
x <- 0:8

# Calculate the probability of getting exactly x heads using the binomial distribution
probability <- dbinom(x, size = n, prob = p)

# Plot the probability distribution
barplot(probability, names.arg = x, xlab = "Number of Heads (k)", ylab = "Probability", main = "Probability of Getting Heads in 8 Coin Tosses")

Probability is 0.27343750, when x = 4. Using the likelihood function, we can also calculate the probability of observing a given outcome (4 heads) given a certain probability parameter (theta= 0.5).

probability

[1] 0.00390625 0.03125000 0.10937500 0.21875000 0.27343750 0.21875000 0.10937500
[8] 0.03125000 0.00390625

Likelihood

For the same events, we can calculate binomial likelihood function

The probability of an event is determined by p and N.

probability mass function : P(x|θ)

likelihood function: L(θ|x)

Likelihood determines how likely theta us given x and N

First Method:

Using original bdinom, we fix the random variable but theta varies.

In this example we fix r.v as 4 and theta 0 - 1 (0.01 steps)

# Number of coin tosses
n <- 8

# Number of heads observed (you can vary the number of heads to see for yourself)
x <- 4

theta <- seq(0,1,0.01)

f <- dbinom(x,n,theta)
plot(theta,f,type="l", xlab="θ",ylab="LF",main="LF,x=4,N=8")

领英推荐

Risk Perception & Varying Contexts or Representation…

Ridley Tony 2 年前

Basics of Multiphase Flow - III

Osama Abdelaziem 2 年前

Anomaly Detection Part 3: Local Outlier Factor

Dr. Anish Roychowdhury, Ph.D. 3 个月前

At theta 0.5, the likelihood is the highest (check y axis 0.27 ish)

Second Method:

you can create a function.

Here we fix r.v to 6

# Create a LF function
# inputs: theta, x and N
# output: Likelihood

likelihood <- function(theta,x,N){
  dbinom(x,N,theta) }

# Set up the inputs for LF you created
# The most important thing is "varying probability of success on each trial"
x <- 6
N <- 8
theta <- seq(0, 1, 0.0001) # varied probability of success

f <- likelihood(theta, x, N)


# plot the likelihood as function of probability of success
plot(theta, f, type="l", xlab="θ", ylab="LF", main="LF, x=6, N=8")

We can see in the code above theta is not fixed and our distribution shifted to the right, with theta being highest 0.7 (maximum). The likelihood is 0.30 (y-axis)

Maximum Likelihood

The ml is obtained at theta = 0.5 if number of coin toss is 8 and number of observing the head is 4.

this likelihood is known as maximum likelihood.

Remember likelihood L(θ|x) = P(x|θ)

For discrete events the likelihood of event occurring (e.g 4 heads ) = 0.27.

also the probability of having 4 heads from a coin toss is also = 0.27

x <- 4
N <- 8
theta <- seq(0, 1, 0.0001) # varied probability of success
f <- likelihood(theta, x, N)


# plot the likelihood as function of probability of success
plot(theta, f, type="l", xlab="θ", ylab="LF", main="LF, x=4, N=8")

Maximizing Likelihood

There are several ways to maximize the likelihood function in R.

Our coin toss examples of simple, but in real world scenarios we will need to find methods to maximize theta.

for example we can find the negative likelihood by creating a new negative-likelihood function.

nlm is non linear minimization, but we can use the nlm function to maximize the likelihood function.

Maximizing a function is the same as minimizing the function multiplied by minus one.

#x = 4
#n=8

negative_likelihood <- function(p){
  dbinom(4, 8, p)*-1
}
# Test that our function is behaving as expected
negative_likelihood(0.5)

[1] -0.2734375

nlm(negative_likelihood,0.5,stepmax=0.5)

$minimum
[1] -0.2734375

$estimate
[1] 0.5

$gradient
[1] 0

$code
[1] 1

$iterations
[1] 1

For a theoretical probability (theta) of 0.5, the nlm outputs the maximum likelihood = 0.2734375.

References:

https://towardsdatascience.com/maximum-likelihood-estimation-in-r-b21f68f1eba4

https://bookdown.org/CMmaterials/wo23_cognitivemodeling/probility-and-likelihood.html

要查看或添加评论，请登录

Tobi Aminu的更多文章

What I’m taking into 2025

2024年12月21日

What I’m taking into 2025

Prioritizing physical and mental health Healthy eating and better sleep schedule Making time for hobbies , friendship…

3 条评论
what a p-value really means?

2024年12月14日

what a p-value really means?

Correct understanding of p-values is not trivial, and it's interpretation is mostly misunderstood. I find myself having…
What can Markov teach us about life?

2024年12月6日

What can Markov teach us about life?

Markov is a Russian mathematician that discovered the famous Markov model - we’ll avoid the maths for simplicity sake…
Learning "Bioinformatics" in 2024

2024年9月3日

Learning "Bioinformatics" in 2024

This is an update from my previous article Learning Bioinformatics in [2023], you should consider reading it, if you…
Welcome back to our Newsletter!

2023年11月19日

Welcome back to our Newsletter!

It's been a while since I sent out a newsletter, but I'm excited to be back with some great resources for you…
Probability is not likelihood.

2023年9月30日

Probability is not likelihood.

I'm taking a statistics class, and occasionally, the instructor uses "probability" and "likelihood" interchangeably…
Are you doing functional enrichment analysis properly?

2023年9月23日

Are you doing functional enrichment analysis properly?

If you've spent time in genomics data analysis, you've likely encountered a list of genes you wanted to understand…

3 条评论
Correct understanding of p-values is not trivial.

2023年9月17日

Correct understanding of p-values is not trivial.

Properly understanding and interpreting p-values can be challenging. Many of us often find ourselves double-checking…

1 条评论
Learning Bioinformatics in 2023: A Step-by-Step Guide

2023年4月29日

Learning Bioinformatics in 2023: A Step-by-Step Guide

Bioinformatics combines biology, computer science, and statistics to analyze and interpret biological data. With the…

2 条评论
Realistic Thinkin

2020年4月15日

Realistic Thinkin

Until a thought is linked with a purpose, there is no intelligent accomplishment. Reality is the difference between…

See all articles

社区洞察

Statistics

How do you compare probability distributions?

Probability and Likelihood in R

Tobi Aminu

Data Scientist | Computational Biologist | Educator

Probability

Flip a coin

PMF

Likelihood

领英推荐

Maximum Likelihood

Maximizing Likelihood

Tobi Aminu的更多文章

社区洞察

其他会员也浏览了

Anomaly Detection Part 3: Local Outlier Factor

The Central Limit Theorem (CLT)

A Deep Dive into ANOVA(part 3)

The Difference Between Association and Correlation

FRACTAL SYMMETRY

The ubiquity of Central Limit Theorem (CLT) | Regression Coefficients

Regression to the Mean

Finding directionality

Eight Ways to Detect Multicollinearity

Concise Basic Stats - Part III: Normal and T Distributions

Probability

Flip a coin

PMF

Likelihood

领英推荐

Maximum Likelihood

Maximizing Likelihood

Tobi Aminu的更多文章

What I’m taking into 2025

what a p-value really means?

What can Markov teach us about life?

Learning "Bioinformatics" in 2024

Welcome back to our Newsletter!

Probability is not likelihood.

Are you doing functional enrichment analysis properly?

Correct understanding of p-values is not trivial.

Learning Bioinformatics in 2023: A Step-by-Step Guide

Realistic Thinkin

社区洞察

其他会员也浏览了

Anomaly Detection Part 3: Local Outlier Factor

The Central Limit Theorem (CLT)

A Deep Dive into ANOVA(part 3)

The Difference Between Association and Correlation

FRACTAL SYMMETRY

The ubiquity of Central Limit Theorem (CLT) | Regression Coefficients

Regression to the Mean

Finding directionality

Eight Ways to Detect Multicollinearity

Concise Basic Stats - Part III: Normal and T Distributions