Probability and Likelihood in R

Probability and Likelihood in R

Probability

What is Probability vs Probability distribution.

Probability is a measure of the likelihood of an event occurring.

Probability distribution is a mathematical function or a model that measures the probability of occurrence of different possible values in an experiment.

Flip a coin

Consider a coin toss example (experiment). If we toss a fair coin and the probability of tossing a head or tail is 50%. p= 0.5.

The probability mass function (PMF) of flipping a coin, the random variable (x) can either be 1 (head) or 0 (tail). The number of coin toss is (n=1) and the theoretical probability theta = 0.5

Using the binomial probability distribution (another type of probability function) we can find out the probability of different possible occurrence e.g tossing 2 heads out of the 8 tosses.

PMF

#dbinom generate probability mass function

#x = targeted random variable 
#N = Number of attempts (no of coin toss) 1
#θ = Theoretical probability of success (getting a head)
#0 = tail, 
#1= head
dbinom(1,size = 1, prob =0.5) #P(x=1|N=1)        
[1] 0.5        
dbinom(0,size=1, prob = 0.5) #P(x=0|N=1)        
[1] 0.5        

Using the binomial probability distribution (another type of probability function) we can find out the probability of different possible occurrence e.g tossing 2 heads out of the 8 tosses.

The random variable x can take in multiple values e.g 0-8

set.seed(123)

# Number of coin tosses
n <- 8

# Probability of getting a head on a single toss
p <- 0.5

# Generate a sequence of possible outcomes (number of heads) or no of random variables
x <- 0:8

# Calculate the probability of getting exactly x heads using the binomial distribution
probability <- dbinom(x, size = n, prob = p)

# Plot the probability distribution
barplot(probability, names.arg = x, xlab = "Number of Heads (k)", ylab = "Probability", main = "Probability of Getting Heads in 8 Coin Tosses")        


Probability is 0.27343750, when x = 4. Using the likelihood function, we can also calculate the probability of observing a given outcome (4 heads) given a certain probability parameter (theta= 0.5).

probability        
[1] 0.00390625 0.03125000 0.10937500 0.21875000 0.27343750 0.21875000 0.10937500
[8] 0.03125000 0.00390625        

Likelihood

For the same events, we can calculate binomial likelihood function

The probability of an event is determined by p and N.

probability mass function : P(x|θ)

likelihood function: L(θ|x)

Likelihood determines how likely theta us given x and N

First Method:

Using original bdinom, we fix the random variable but theta varies.

In this example we fix r.v as 4 and theta 0 - 1 (0.01 steps)

# Number of coin tosses
n <- 8

# Number of heads observed (you can vary the number of heads to see for yourself)
x <- 4

theta <- seq(0,1,0.01)

f <- dbinom(x,n,theta)
plot(theta,f,type="l", xlab="θ",ylab="LF",main="LF,x=4,N=8")        


At theta 0.5, the likelihood is the highest (check y axis 0.27 ish)

Second Method:

you can create a function.

Here we fix r.v to 6

# Create a LF function
# inputs: theta, x and N
# output: Likelihood

likelihood <- function(theta,x,N){
  dbinom(x,N,theta) }

# Set up the inputs for LF you created
# The most important thing is "varying probability of success on each trial"
x <- 6
N <- 8
theta <- seq(0, 1, 0.0001) # varied probability of success

f <- likelihood(theta, x, N)


# plot the likelihood as function of probability of success
plot(theta, f, type="l", xlab="θ", ylab="LF", main="LF, x=6, N=8")        


We can see in the code above theta is not fixed and our distribution shifted to the right, with theta being highest 0.7 (maximum). The likelihood is 0.30 (y-axis)

Maximum Likelihood

The ml is obtained at theta = 0.5 if number of coin toss is 8 and number of observing the head is 4.

this likelihood is known as maximum likelihood.

Remember likelihood L(θ|x) = P(x|θ)

For discrete events the likelihood of event occurring (e.g 4 heads ) = 0.27.

also the probability of having 4 heads from a coin toss is also = 0.27

x <- 4
N <- 8
theta <- seq(0, 1, 0.0001) # varied probability of success
f <- likelihood(theta, x, N)


# plot the likelihood as function of probability of success
plot(theta, f, type="l", xlab="θ", ylab="LF", main="LF, x=4, N=8")        


Maximizing Likelihood

There are several ways to maximize the likelihood function in R.

Our coin toss examples of simple, but in real world scenarios we will need to find methods to maximize theta.

for example we can find the negative likelihood by creating a new negative-likelihood function.

nlm is non linear minimization, but we can use the nlm function to maximize the likelihood function.

Maximizing a function is the same as minimizing the function multiplied by minus one.

#x = 4
#n=8

negative_likelihood <- function(p){
  dbinom(4, 8, p)*-1
}
# Test that our function is behaving as expected
negative_likelihood(0.5)        
[1] -0.2734375        
nlm(negative_likelihood,0.5,stepmax=0.5)        
$minimum
[1] -0.2734375

$estimate
[1] 0.5

$gradient
[1] 0

$code
[1] 1

$iterations
[1] 1        

For a theoretical probability (theta) of 0.5, the nlm outputs the maximum likelihood = 0.2734375.

References:

https://towardsdatascience.com/maximum-likelihood-estimation-in-r-b21f68f1eba4

https://bookdown.org/CMmaterials/wo23_cognitivemodeling/probility-and-likelihood.html

要查看或添加评论,请登录

Tobi Aminu的更多文章

  • Learning "Bioinformatics" in 2024

    Learning "Bioinformatics" in 2024

    This is an update from my previous article Learning Bioinformatics in [2023], you should consider reading it, if you…

  • Welcome back to our Newsletter!

    Welcome back to our Newsletter!

    It's been a while since I sent out a newsletter, but I'm excited to be back with some great resources for you…

  • Probability is not likelihood.

    Probability is not likelihood.

    I'm taking a statistics class, and occasionally, the instructor uses "probability" and "likelihood" interchangeably…

  • Are you doing functional enrichment analysis properly?

    Are you doing functional enrichment analysis properly?

    If you've spent time in genomics data analysis, you've likely encountered a list of genes you wanted to understand…

    4 条评论
  • Correct understanding of p-values is not trivial.

    Correct understanding of p-values is not trivial.

    Properly understanding and interpreting p-values can be challenging. Many of us often find ourselves double-checking…

    1 条评论
  • Learning Bioinformatics in 2023: A Step-by-Step Guide

    Learning Bioinformatics in 2023: A Step-by-Step Guide

    Bioinformatics combines biology, computer science, and statistics to analyze and interpret biological data. With the…

    2 条评论
  • Realistic Thinkin

    Realistic Thinkin

    Until a thought is linked with a purpose, there is no intelligent accomplishment. Reality is the difference between…

  • Beginners guide to Design thinking

    Beginners guide to Design thinking

    Ever heard of the popular phrase “Think Outside the box” ? Design thinking involves thinking on the four edges of those…

  • The future after the pandemic

    The future after the pandemic

    Over the first quarter of 2020, the world was shook not by a missile or a terrorist attack but by a tiny microbe which…

    12 条评论

社区洞察

其他会员也浏览了