Supervised Learning Algorithms
Introduction
Supervised learning algorithms are, roughly speaking, learning algorithms that learn to associate some input with some output, given a training set of?samples of?inputs x and outputs y. In different cases, the outputs y?could also be?difficult?to gather?automatically and must be provided by?a person’s?“supervisor,” but the term still applies even when the training set targets were collected automatically.
Probabilistic Supervised Learning
Most supervised learning algorithms are?supported?by?estimating a probability distribution p(y | x).?we will?do that?just by?using maximum likelihood estimation?to seek out?the simplest?parameter vector θ for a parametric family?of distributions p ( y | x; θ ). We’ve?already known that?rectilinear regression?corresponds to the family
????????????????????????????????????p ( y | x ; θ ) = N ( y ; θ x , I ).
We can generalize?rectilinear regression?to the classification scenario by defining?a special?family of probability distributions. If?we’ve?two classes, class 0?and sophistication?1, then?we’d like?only to specify the probability?of 1?of those?classes. The probability?of sophistication?1 determines the probability?of sophistication?0 because these two values?must add up to 1.
The normal distribution over real-valued numbers that we used for?rectilinear regression?is parametrized in terms of a mean.?Any value we provide for this mean is valid. Distribution over a binary variable is slightly more complicated because its mean?should?be between 0 and 1.?a method?to unravel?this problem is to use the logistic sigmoid function to squash the output of the linear function into the interval (0, 1) and interpret that value as a probability:
????????????????????????????????????p ( y = 1 | x ; θ ) = σ (θ x).
This approach?is understood?as logistic regression (a somewhat strange name since we use the model for classification?instead of?regression).?within the?case of?rectilinear regression, we were?ready to?find the optimal weights by?solving?the traditional?equations. Logistic regression is somewhat?harder.?there’s?no closed-form solution for its optimal weights. Instead, we must?look for?them by maximizing the log-likelihood. We will?do that?by minimizing the negative log-likelihood?(NLL)?using gradient descent. This similar strategy?is often?applied to essentially any supervised learning problem, by writing down a parametric family of?contingent probability?distributions over?the proper?quiet?input and output variables.
For more details visit:https://www.technologiesinindustry4.com/2021/05/supervised-learning-algorithms.html