Machine Learning Algorithms - A Compendium
Machine Learning Algorithms - A Compendium

Machine Learning Algorithms - A Compendium

A learning algorithm is a method used to process data to extract patterns appropriate for application in a new situation. The goal is to adapt a system to a specific input-output transformation task.

LEARNING = REPRESENTATION (hypothesis space) + EVALUATION (objective function or scoring function) + OPTIMIZATION

Bayes Theorem set out in 1763 remains a central concept to Machine Learning.

A computer program is said to learn from Experience ‘E’ w.r.t some class of Tasks ‘T’ and Performance measure ‘P’, if its performance at tasks in ‘T’, as measured by ‘P’, improves with experience ‘E’.

Twelve most commonly used Machine Learning Algorithms are described below.

1.      Logistic regression

This is called Logistic regression due to its similarity to linear regression. It is form of Classification, and not Regression. This algorithm type performs well on linearly separable classes.

Benefits: Computationally inexpensive, easy to implement, knowledge representation easy to interpret

Drawbacks: Prone to under-fitting, may have low accuracy

Data set: Numeric/ nominal

2.      Support vector machines (SVM)

SVM are linear model like logistic regression but can use Kernels (similarity functions between a pair of samples). The objective is to maximizes the margin between classes for better generalization. These are called machines because they generate a binary decision.

Benefits: Generalization errors are less, outcomes are easy to interpret, excels at high dimensional linear problems, allows noise/errors

Drawbacks: Delicate to tuning parameters and kernel used, intuitively handles binary classifications, slow at prediction time when using Kernels

Data set: Numeric/ nominal

3.      K-nearest neighbors (kNN)

kNN is called lazy learner because it does not learn a discriminating function from the data set but memorizes the training data set instead. It allows to make predictions without any model training.

kNN is a non-parametric classifier which simply “looks at” the K points in the training set that are nearest to the test input x, counts how many members of each class are in this set, and returns that empirical fraction as the estimate.

Benefits: High precision, unresponsive to outliers, lack of assumptions

Drawbacks: Necessitates a lot of memory, difficult to work with multi-dimensional data, lot of computations involved

Data set: Numeric/ nominal

4.      Decision trees (CART models)

Classification and regression trees (CART models) or Decision trees are defined by recursively partitioning the input space and defining a local model in each resulting region of input space.

CART uses “Divide-and-Conquer” method and “Top-down induction” approach to learn simple decision rules as a tree. A function is used to find best attribute to branch tree. The standard approach is to grow a “full” tree, and then perform pruning.

Benefits: Easy to comprehend, works well with missing values and irrelevant features, needs less computation

Drawbacks: Susceptible to to over-fitting, only locally optimal decisions

Data set: Numeric/ nominal

5.      Naive Bayes

“Naively” assumes independence and based on “Baye’s Rule”. This algorithm assumes that attributes are independent which in real life is certainly a simplistic one. Despite its incorrect assumptions, Naive Bayes is effective at classification, is fast and accurate, and hence popular.

Benefits: Works well with small data set, levers multiple classes, easy to implement

Drawbacks: Subtle to input data preparation methodology

Data set: Only Nominal

6.      Ada Boost (Ensemble Methods)

Ada Boost uses a weak learner as the base classifier with the input data weighted by a weight vector. In the first iteration the data is equally weighted. But in subsequent iterations the data is weighted more strongly if it was incorrectly classified previously. This adapting to the errors is the strength of Ada Boost. Used for combining predictions of multiple classifiers.

Benefits: Generalization errors are less, coding is simple, goes well with most classifiers, nil parameter adjustments

Drawbacks: Subtle to outliers

Data set: Numeric/ nominal

7.      K-Means Clustering

This algorithm makes initial guess of k-random cluster centers known as “Centroids” and then uses iterations to refine the guess. It computes the distance from every point to the cluster centers. Each point is assigned the closest cluster center. The cluster centers are then recalculated based on new point in the cluster.

Hard clustering assigns a sample to one cluster. Soft clustering assigns a sample to one or more clusters.

Benefits: Ease of implementation, fast, often used for pre-clustering

Drawbacks: May congregate at local minimum, sluggish on very huge data sets, sensitive to noise and outliers

Data set: Only numeric

8.      Linear regression

Linear regression is the “work horse” of statistics and (supervised) machine learning.

When augmented with kernels or other forms of basis function expansion, it can model non-linear relationships. And when the Gaussian output is replaced with a Bernoulli or Multinoulli distribution, it can be used for classification.

Linear regression is an excellent, simple method for numeric prediction, “best fitting” straight line (least mean squared difference).

Benefits: Outcomes are easy to interpret, less computations involved

Drawbacks: Poorly models nonlinear data

Data set: Numeric/ nominal

9.      Random forests

A forest is a set of trees. The technique known as random forests tries to decorrelate the base

learners by learning trees based on a randomly chosen subset of input variables, as well as a

randomly chosen subset of data cases. An attractive model for many practical problem domains.

The output is random forest algorithm is an ensemble of tree models whose predictions are to be combined by voting or averaging.

Benefits: Very good predictive accuracy, ensembles enable good generalization, handles noise and outliers well, bias reduction, doesn’t over-fit so easily

Drawbacks: Random sampling isn’t random enough, lack of interpret-ability/ inference, speed depends on the forest size

Data set: Only Numeric

10.  Principal Component Analysis (PCA)

PCA converts a set of possibly correlated features into a set of linearly uncorrelated features called principal components (or principal modes of variation). This solves issues of multi-col-linearity, visualize higher-dimensional data and detect outliers.

Benefits: Reduces complexity of data, identifies most important features, reveal hidden structures

Drawbacks: May not be needed, could throw away useful information

Data set: Only numeric

11.  Apriori algorithm

A priori in Latin means “from before”. A priori knowledge can come from domain information, previous dimensions, etc. Apriori principle says that “if an item set is frequent, then all its subsets are frequent”. This principle helps to reduce the number of possible item sets.

Apriori algorithm desires a minimum support level as an input and a data set. It creates a list of all candidate item sets with one item. The scanning of transaction data set defines which sets meet the requirement of minimum support level. Sets not meeting the minimum support level are stirred out. The remaining data sets are then integrated to make item sets with at least two elements. The transaction data sets are scanned again, and item sets not meeting the minimum support level are stirred. This process is repeated until all sets are stirred out.

Benefits: Coding is easy

Drawbacks: May be sluggish on huge data sets

Data set: Numeric/ nominal

12.  EM algorithm

Expectation maximization (EM) is a simple iterative algorithm with closed-form updates at each step. This algorithm automatically enforces the required constraints.

EM exploits the fact that if the data were fully observed, then the ML/ MAP estimate would be easy to compute. It is an iterative algorithm which substitutes between deducing the missing values given the parameters (E step), and then optimizing the parameters given the “filled in” data (M step).

EM is one of the most widely used ML algorithms. EM has many variations like Annealed EM, Variational EM, Monte Carlo EM, Generalized EM (GEM), Expectation conditional maximization either [ECM(E)], and Over-relaxed EM.

Benefits: Automatically enforces the constraints, quite intuitive, robustness to outliers

Drawbacks: Can be much slower than direct gradient methods

Data set: Numeric/ nominal

****************

Happy Learning :)

****************

 Disclaimer: The views expressed in this blog are my personal point of view and do not in any way represent that of the organization I work for.

Manjeet Kaur

Strategic IT Leader | Postgraduate Certificate in Project Management | Driving Innovation & Transformation Across Industries

5 年

Thank you for sharing your learning sir, and would look forward to receive many more from you !! ????

回复

要查看或添加评论,请登录

Lokesh Madan的更多文章

  • Leadership (Half) Marathon with PVT

    Leadership (Half) Marathon with PVT

    "The function of leadership is to produce more leaders, not more followers." — Ralph Nader "Leadership and learning are…

    2 条评论
  • Top 20 for G20

    Top 20 for G20

    Preamble of G20 New Delhi Leaders’ Declaration, the premier global forum for international economic cooperation, under…

    1 条评论
  • ‘VUCA’ conquers >>>----> ‘VUCA’

    ‘VUCA’ conquers >>>----> ‘VUCA’

    VUCA (Versatility, Universality, Compassion and Adaptability) conquers VUCA (Volatility, Uncertainty, Complexity and…

  • Agile 2020 – A mindset

    Agile 2020 – A mindset

    "Arise! Awake! And stop not until the goal is reached" – Swami Vivekananda Agile Is A Mindset, Not A Methodology. “The…

    2 条评论
  • Pause powers purposeful performance (PPPP)

    Pause powers purposeful performance (PPPP)

    Muni (Sanskrit "silent", the "Mauna" - pause) – a term for types of ancient Indian sages and hermits or ancient Indian…

    5 条评论
  • Engineering an Employee Experience (EX)

    Engineering an Employee Experience (EX)

    “Do not be led by others, awaken you own mind, amass your own experience, and decide for yourself your own path.” - The…

    8 条评论
  • Small Cells – The invisible potential

    Small Cells – The invisible potential

    Achieving significant advances in capacity and network speed will require densification of networks consistent with a…

  • 30 Strategic Priorities for 2030

    30 Strategic Priorities for 2030

    Past is History, Future is Mystery. And to uncover the mystery, we need to strategically think and plan.

    4 条评论
  • The Future of Work: A Global Vision

    The Future of Work: A Global Vision

    The world is on the inflection point of change. Empowered by the adoption of exponential technologies like 5G…

    6 条评论
  • The ‘C’ in Corporate's: Beyond 2020 – A perspective

    The ‘C’ in Corporate's: Beyond 2020 – A perspective

    “A human being is part of the whole, called by us “Universe”, a part limited in time and space. He experiences himself,…

    4 条评论

社区洞察

其他会员也浏览了