Parametric Methods for Machine Learning – Part 1

Parametric Methods for Machine Learning – Part 1

In parametric methods, we make an assumption that the sample is drawn from some probability distribution e.g. Gaussian. A probability distribution assigns a probability to a random variable (a random variable is the result of an experiment and can be Discrete or Continuous).

Parametric methods are extremely popular in Data Science and Machine Learning as the probability distribution (which we assumed the sample data to follow) can be modeled using the parameters of the distribution (sufficient statistics) and therefore once the parameters are known, the whole distribution p(x) is known. Therefore, parametric methods are used to summarize the data.

As the entire distribution is summarized by a few parameters and the parameters are learnt from the sample data during the training phase, parametric methods are computationally faster than the non-parametric methods.

Popular Distributions

Some of the popular distributions that are used to model the sample data are the following:

Binomial Distributions

A Bernoulli process is one in which there are 2 possible outcomes i.e. random variable x has to possible values - Success and Failure. If the probability of Success is p, the probability of Failure will be 1-p. In such case, the Bernoulli distribution is defined as p(x) = p^x. (1-p)^1-x.

N trials of a Bernoulli process yield a Binomial Distribution where the probability of each outcome (success or failure) is fixed across all experiments.

Multinomial Distributions

A multinomial distribution is a generalisation of Bernoulli where instead of two states, the outcome of a random event is one of K mutually exclusive and exhaustive states. The distribution is represented as p(x) = Π(pi^xi).

In addition, the multinomial distribution is a multivariate distribution.  That is, it is a distribution over several random variables.  The random variables correspond to the total number of each possible outcome over all trials.

Applications of Multinomial Distributions - Multinomial distributions can be used to model discrete data like words in a document, DNA sequences etc. and is widely used in text analysis, DNA analysis.

Poisson Distribution

It usually is applicable in situations where random phenomena occur at a certain rate over a period of time. For example, it describes the number of people in line at a checkout counter as well as the number of telephone calls received at a switching point. The probability that there are exactly x occurrences in a given time interval is given by Poisson Distribution.

It has only one parameter (λ). Its mean and variance are equal to λ i.e. the parameter λ is a positive real number equal to the expected number (mean) of occurrences that occur given a time interval. The parameter λ can be estimated by dividing the average number of successes observed over a certain time , (μ) , by the length of time interval t as λ = μ/ t. E.g. if the events occur on average every 4 minutes and you are interested in the number of events occurring in a 10 minute interval, then λ = 10/4. Poisson distribution is represented as p(x) = exp(-λ). λ^x / x!

Applications of Poisson Distribution are:

  • The number of arrivals at a car wash in one hour.
  • The number of network failures per day.
  • The number of customers who call to complain about a service problem per month.
  • The number of visitors to a Web site per minute

Gamma Distribution

It has 2 parameters (shape – k and scale – θ), its domain is only positive real numbers and looks like:

 Its mode is at (k -1) *θ, mean is k * θ and variance is k * θ^2. As the shape factor increases, the mean increases and the skewness decreases to the point where probability density function is almost symmetrical.

Applications of Gamma Distributions:

  • Modelling of Rainfall. Gamma Distributions are commonly used for modelling of rainfall. Since negative rainfall is meaningless and Gamma distributions are bounded on the left at zero, so a distribution that excludes negative values is valid. Second the Gamma distribution is positively skewed i.e. it has an extended tail on the right side, this mimics distributions in many areas where there is non-zero probability of heavy rainfall, even though the typical rainfall may not be heavy. Also, since changing the shape parameter can change the shape of the Gamma distribution, it can closely model other rainfall patterns with reasonable accuracy.
  • For similar reasons, Gamma distributions are used to model insurance claims, size of loan defaults.
  • Gamma distributions are also used for models that are based on intervals between events e.g. server loads, flow of items through a manufacturing or distribution process.

Beta Distribution

Beta distributions are only defined between [0,1] and so are very versatile to model outcomes like probabilities. Beta distributions can describe one’s personal beliefs without seeing the data.

They have 4 parameters (α, β, minimum and maximum), which work together to determine if the mode is in the interior of the unit interval and if the distribution is symmetrical. The shape of beta distribution is as follows: 

For symmetric distributions, alpha and beta are same and the size of their values determine the peak size. Asymmetric distributions are modelled with different alpha and betas, with greater asymmetry with greater difference between alpha and beta. If alpha is less than beta, then distributions are skewed towards the right, otherwise distributions are skewed towards the left.

Common Applications of Beta Distribution -

  • Beta distribution is used to model our personal beliefs without taking the data into accordance. For example, it can be used to model the probability distribution of batting averages in cricket before the season has started (batting averages is number of runs scored divided by number of times the player got out and therefore is between [0,1]). Lets say we want to predict the batting average of a player during a cricket season (e.g. Ashes 2015) before the season starts. From historical Ashes data, we might have reason to believe that the batting average of the batsman might be between 0.2 to 0.5, we can model this as Beta(11,19) for example. If all averages were equally likely, then we would have modelled it as Beta(1,1).
  • Beta Distributions are also commonly used for various project planning/control systems e.g. PERT and CPM (Critical Path Method). In such analysis, we try to model the time completion for a task. In PERT analysis, we acquire estimates for minimum, modal (most likely) and maximum time to completion. Using these estimates, the mean and variance are first calculated and once the mean and variance is know, the alpha and beta shape parameters can be calculated. Once the Beta distribution is known, we can calculate the quartiles (25%, 75% completion).

Gaussian Distributions

The Gaussian, also known as normal distribution, is widely used to model for the distribution of continuous variables. In the case of a single variable x, the Gaussian distribution can be written in the form: 

                                                                   

where μ is the mean and σ is the standard deviation. The case where μ = 0 and σ = 1 is called standard normal. For a d-dimensional vector x, the multivariate Gaussian distribution takes the form: 

 where Σ is the dxd covariance matrix and |Σ| denotes the determinant of Σ. The normal distribution is widely used. Part of the appeal is that it is well behaved and mathematically tractable. However, the central limit theorem provides a theoretical basis for why it has wide applicability. The central limit theorem basically states that as the sample size (N) becomes large, the following occur:

  • The sampling distribution of the mean becomes approximately normal regardless of the distribution of the original variable.
  • The sampling distribution of the mean is centered at the population mean, μ, of the original variable. In addition, the standard deviation of the sampling distribution of the mean approaches σ/sqrt(N)

 Approximately normal distributions occur in many situations, as explained by the central limit theorem. When the outcome is produced by many small effects acting additively and independently, its distribution will be close to normal. The normal approximation will not be valid if the effects act multiplicatively (instead of additively), or if there is a single external influence that has a considerably larger magnitude than the rest of the effects.

 Normal distributions are by far the most widely used distributions. Some applications (besides the familiar grading systems) are:

  • In finance, changes in the log of exchange rates, stock indices and price indices are assumed normal. 
  • In biology, the measure of size of  tissues (length, height, skin area, weight) behave log-normal. 
  • In physiology, the blood pressure of adult humans also behave in a log-normal fashion.

 

要查看或添加评论,请登录

Gaurav Khullar的更多文章

  • An Introduction to Multiple Regression

    An Introduction to Multiple Regression

    Business Applications of Regression Linear Regression is a very powerful statistical technique that has a variety of…

社区洞察

其他会员也浏览了