Machine Learning Algorithms - A Compendium
Lokesh Madan
Competency Development Manager @ Ericsson | Solutions Architect | Business Analyst | Cloud, 5G & IOT Aspirant
A learning algorithm is a method used to process data to extract patterns appropriate for application in a new situation. The goal is to adapt a system to a specific input-output transformation task.
LEARNING = REPRESENTATION (hypothesis space) + EVALUATION (objective function or scoring function) + OPTIMIZATION
Bayes Theorem set out in 1763 remains a central concept to Machine Learning.
A computer program is said to learn from Experience ‘E’ w.r.t some class of Tasks ‘T’ and Performance measure ‘P’, if its performance at tasks in ‘T’, as measured by ‘P’, improves with experience ‘E’.
Twelve most commonly used Machine Learning Algorithms are described below.
1. Logistic regression
This is called Logistic regression due to its similarity to linear regression. It is form of Classification, and not Regression. This algorithm type performs well on linearly separable classes.
Benefits: Computationally inexpensive, easy to implement, knowledge representation easy to interpret
Drawbacks: Prone to under-fitting, may have low accuracy
Data set: Numeric/ nominal
2. Support vector machines (SVM)
SVM are linear model like logistic regression but can use Kernels (similarity functions between a pair of samples). The objective is to maximizes the margin between classes for better generalization. These are called machines because they generate a binary decision.
Benefits: Generalization errors are less, outcomes are easy to interpret, excels at high dimensional linear problems, allows noise/errors
Drawbacks: Delicate to tuning parameters and kernel used, intuitively handles binary classifications, slow at prediction time when using Kernels
Data set: Numeric/ nominal
3. K-nearest neighbors (kNN)
kNN is called lazy learner because it does not learn a discriminating function from the data set but memorizes the training data set instead. It allows to make predictions without any model training.
kNN is a non-parametric classifier which simply “looks at” the K points in the training set that are nearest to the test input x, counts how many members of each class are in this set, and returns that empirical fraction as the estimate.
Benefits: High precision, unresponsive to outliers, lack of assumptions
Drawbacks: Necessitates a lot of memory, difficult to work with multi-dimensional data, lot of computations involved
Data set: Numeric/ nominal
4. Decision trees (CART models)
Classification and regression trees (CART models) or Decision trees are defined by recursively partitioning the input space and defining a local model in each resulting region of input space.
CART uses “Divide-and-Conquer” method and “Top-down induction” approach to learn simple decision rules as a tree. A function is used to find best attribute to branch tree. The standard approach is to grow a “full” tree, and then perform pruning.
Benefits: Easy to comprehend, works well with missing values and irrelevant features, needs less computation
Drawbacks: Susceptible to to over-fitting, only locally optimal decisions
Data set: Numeric/ nominal
5. Naive Bayes
“Naively” assumes independence and based on “Baye’s Rule”. This algorithm assumes that attributes are independent which in real life is certainly a simplistic one. Despite its incorrect assumptions, Naive Bayes is effective at classification, is fast and accurate, and hence popular.
Benefits: Works well with small data set, levers multiple classes, easy to implement
Drawbacks: Subtle to input data preparation methodology
Data set: Only Nominal
6. Ada Boost (Ensemble Methods)
Ada Boost uses a weak learner as the base classifier with the input data weighted by a weight vector. In the first iteration the data is equally weighted. But in subsequent iterations the data is weighted more strongly if it was incorrectly classified previously. This adapting to the errors is the strength of Ada Boost. Used for combining predictions of multiple classifiers.
Benefits: Generalization errors are less, coding is simple, goes well with most classifiers, nil parameter adjustments
Drawbacks: Subtle to outliers
Data set: Numeric/ nominal
7. K-Means Clustering
This algorithm makes initial guess of k-random cluster centers known as “Centroids” and then uses iterations to refine the guess. It computes the distance from every point to the cluster centers. Each point is assigned the closest cluster center. The cluster centers are then recalculated based on new point in the cluster.
Hard clustering assigns a sample to one cluster. Soft clustering assigns a sample to one or more clusters.
Benefits: Ease of implementation, fast, often used for pre-clustering
Drawbacks: May congregate at local minimum, sluggish on very huge data sets, sensitive to noise and outliers
Data set: Only numeric
8. Linear regression
Linear regression is the “work horse” of statistics and (supervised) machine learning.
When augmented with kernels or other forms of basis function expansion, it can model non-linear relationships. And when the Gaussian output is replaced with a Bernoulli or Multinoulli distribution, it can be used for classification.
Linear regression is an excellent, simple method for numeric prediction, “best fitting” straight line (least mean squared difference).
Benefits: Outcomes are easy to interpret, less computations involved
Drawbacks: Poorly models nonlinear data
Data set: Numeric/ nominal
9. Random forests
A forest is a set of trees. The technique known as random forests tries to decorrelate the base
learners by learning trees based on a randomly chosen subset of input variables, as well as a
randomly chosen subset of data cases. An attractive model for many practical problem domains.
The output is random forest algorithm is an ensemble of tree models whose predictions are to be combined by voting or averaging.
Benefits: Very good predictive accuracy, ensembles enable good generalization, handles noise and outliers well, bias reduction, doesn’t over-fit so easily
Drawbacks: Random sampling isn’t random enough, lack of interpret-ability/ inference, speed depends on the forest size
Data set: Only Numeric
10. Principal Component Analysis (PCA)
PCA converts a set of possibly correlated features into a set of linearly uncorrelated features called principal components (or principal modes of variation). This solves issues of multi-col-linearity, visualize higher-dimensional data and detect outliers.
Benefits: Reduces complexity of data, identifies most important features, reveal hidden structures
Drawbacks: May not be needed, could throw away useful information
Data set: Only numeric
11. Apriori algorithm
A priori in Latin means “from before”. A priori knowledge can come from domain information, previous dimensions, etc. Apriori principle says that “if an item set is frequent, then all its subsets are frequent”. This principle helps to reduce the number of possible item sets.
Apriori algorithm desires a minimum support level as an input and a data set. It creates a list of all candidate item sets with one item. The scanning of transaction data set defines which sets meet the requirement of minimum support level. Sets not meeting the minimum support level are stirred out. The remaining data sets are then integrated to make item sets with at least two elements. The transaction data sets are scanned again, and item sets not meeting the minimum support level are stirred. This process is repeated until all sets are stirred out.
Benefits: Coding is easy
Drawbacks: May be sluggish on huge data sets
Data set: Numeric/ nominal
12. EM algorithm
Expectation maximization (EM) is a simple iterative algorithm with closed-form updates at each step. This algorithm automatically enforces the required constraints.
EM exploits the fact that if the data were fully observed, then the ML/ MAP estimate would be easy to compute. It is an iterative algorithm which substitutes between deducing the missing values given the parameters (E step), and then optimizing the parameters given the “filled in” data (M step).
EM is one of the most widely used ML algorithms. EM has many variations like Annealed EM, Variational EM, Monte Carlo EM, Generalized EM (GEM), Expectation conditional maximization either [ECM(E)], and Over-relaxed EM.
Benefits: Automatically enforces the constraints, quite intuitive, robustness to outliers
Drawbacks: Can be much slower than direct gradient methods
Data set: Numeric/ nominal
****************
Happy Learning :)
****************
Disclaimer: The views expressed in this blog are my personal point of view and do not in any way represent that of the organization I work for.
Strategic IT Leader | Postgraduate Certificate in Project Management | Driving Innovation & Transformation Across Industries
5 年Thank you for sharing your learning sir, and would look forward to receive many more from you !! ????