Introduction to Machine Learning Algorithms
Source: Flickr

Introduction to Machine Learning Algorithms

If anyone is looking for a brief overview of the machine learning algorithms list, this article provides some important context to the layout of how machine learning algorithms are structured and how one can start to delve into the individual topics. As always, Wikipedia is a great source to get started on individual topics.

First, machine learning algorithms are classified into 4 major buckets of learning processes. They are as follows:

Supervised Learning: Mathematics is a language of statistics and it deals with functional forms. This form Y = f(X) has always carried a definition for a dependent variable based on a defined function (f). Supervised learning inherently captures this functional form to better predict Y. It is supervised by many defined functional algorithms that define Y. Machines (or programs) can try to find this generalized rule to map out many inputs to the outcome variable Y.

Unsupervised Learning: It is very difficult for humans to understand data without any context or (in DB parlance — no metadata). It simply means that our observational data points weren’t really given specific features to select from. Many data points were gathered for each observation that did not have a feature definition. Now, it's the algorithm that needs to find structure in its inputs. With no labels, the algorithm needs to discover hidden patterns in data or establish features that were not observed before. This type of algorithm falls under unsupervised learning, meaning there is no functional form driving the prediction of output.

Semi-supervised Learning: This is something you might observe in data where you have observations with labels and some don’t. Meaning the metadata is incomplete and yet many ML engineers or Data Scientists were able to find useful insights when the complete set of the label and unlabeled data were used in conjunction to improve the accuracy of learning.

Reinforcement Learning: This is the more complicated but exciting part of a machine learning algorithm where an agent (usually a program or software) tries to learn from its environment by taking feedback. The program has to take action to maximize its reward points that are defined by the originator of the program. The field is studied extensively when many agents are competing for similar rewards systems. Disciplines such as game theory, control theory, operations research, information theory, simulation-based optimization, etc come under its preview.

Below is a list of ML Algorithms (for more information, refer to Wikipedia.org) that fall under the 4 major classification categories defined above.

Regression-based Algorithms

In statistics, linear regression is a linear approach to modeling the relationship between a scalar response and one or more explanatory variables. The case of one explanatory variable is called simple linear regression; for more than one, the process is called multiple linear regression.

  • Ordinary Least Squares Regression (OLSR)
  • Linear Regression
  • Logistic Regression
  • Stepwise Regression
  • Multivariate adaptive regression splines (MARS)
  • Locally Estimated Scatterplot smoothing (LOESS)

https://en.wikipedia.org/wiki/Linear_regression

Instance-based algorithms

Instance-based learning (memory-based learning) is a family of learning algorithms that, instead of performing explicit generalization, compare new problem instances with instances seen in training, which have been stored in memory.

  • K-nearest neighbor (kNN)
  • Learning Vector Quantization (LVQ)
  • Self-Organizing Map (SOM)
  • Locally weighted learning (LWL)
  • Support Vector Machines (SVM)

https://en.wikipedia.org/wiki/Instance-based_learning

Regularization Algorithms

Regularization is a technique used in regression to reduce the complexity of the model and to shrink the coefficients of the independent features.

  • Ridge Regression
  • Least Absolute Shrinkage and Selection Operator (LASSO)
  • Elastic Net
  • Least-Angle Regression (LARS)

https://medium.com/analytics-vidhya/understanding-regularization-algorithms-450777fa0ed3

Decision Tree Algorithms

Decision Tree algorithms belong to the family of supervised learning algorithms. The goal of using a decision tree is to create a training model that can be used to predict the class or value of the target variables by learning simple decision rules inferred from prior data.

  • Classification and regression Tree (CART)
  • Iterative Dichotomiser 3 (ID3)
  • C4.5 and C5.0 (different versions of a powerful approach)
  • Chi-squared Automatic Interaction Detection (CHAID)
  • Decision Stump
  • M5
  • Conditional Decision Trees

https://en.wikipedia.org/wiki/Decision_tree

Bayesian Algorithms

A family of algorithms where all of them share a common principle, I.e. every pair of features being classified is independent of each other. Naive Bayes classifiers are a collection of classification algorithms based on Bayes’s theorem. Bayes’ formula provides a relationship between P(A/B) and P(B/A).

  • Naive Bayes
  • Gaussian Naive Bayes
  • Multinomial Naive Bayes
  • Averaged One-Dependence Estimators (AODE)
  • Bayesian Belief Network (BBN)
  • Bayesian Network (BN)

https://towardsdatascience.com/ml-algorithms-one-sd-%CF%83-bayesian-algorithms-b59785da792a

Clustering Algorithms

“Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters)”

  • K-Means
  • K-medians
  • Expectation-Maximization (EM)
  • Hierarchical Clustering

https://en.wikipedia.org/wiki/Cluster_analysis

Association Rule Learning Algorithms

“Association rule learning is a rule-based machine learning method for discovering interesting relations between variables in large databases. It is intended to identify strong rules discovered in databases using some measures of interestingness.”

  • Apriori Algorithms
  • Eclat Algorithms

https://en.wikipedia.org/wiki/Association_rule_learning

Artificial Neural Network Algorithms

Artificial neural networks (ANNs), usually simply called neural networks (NNs), are computing systems vaguely inspired by the biological neural networks that constitute animal brains.”

  • Perceptron
  • Multilayer Perceptrons (MLP)
  • Back-Propagation
  • Stochastic Gradient Descent
  • Hopfield Network
  • Radial Basis Function Network (RBFN)

https://en.wikipedia.org/wiki/Artificial_neural_network

Deep Learning Algorithms

Deep learning (also known as deep structured learning) is part of a broader family of machine learning methods based on artificial neural networks with representation learning. Learning can be supervisedsemi-supervised or unsupervised

  • Convolution Neural Network (CNN)
  • Recurrent Neural Networks (RNNs)
  • Long Short-Term Memory Networks (LSTMs)
  • Stacked Auto-Encoders
  • Deep Boltzmann Machine (DBM)
  • Deep Belief Networks (DBN)

https://en.wikipedia.org/wiki/Deep_learning

Dimensionality Reduction Algorithms

Dimensionality reduction, or dimension reduction, is the transformation of data from a high-dimensional space into a low-dimensional space so that the low-dimensional representation retains some meaningful properties of the original data, ideally close to its intrinsic dimension

  • Principal Component Analysis (PCA)
  • Principal Component Regression (PCR)
  • Partial Least Squares Regression (PLSR)
  • Sammon Mapping
  • Multidimensional Scaling (MDS)
  • Projection Pursuit
  • Linear Discriminant Analysis (LDA)
  • Mixture Discriminant Analysis (MDA)
  • Quadratic Discriminant Analysis (QDA)
  • Flexible Discriminant Analysis (FDA)

https://en.wikipedia.org/wiki/Dimensionality_reduction

Ensemble Algorithms

“In statistics and machine learningensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone”

  • Boosting
  • Bootstrapped Aggregation (Bagging)
  • AdaBoost
  • Weighted Average (Blending)
  • Stacked Generalization (Stacking)
  • Gradient Boosting Machines (GBM)
  • Gradient Boosted Regression Trees (GBRT)
  • Random Forest

https://en.wikipedia.org/wiki/Ensemble_learning

Pasunuri Prathiba

Senior Data Scientific @ BOSCH | AI | M.Tech at IIT Bombay

3 年

Very useful, thank you.

回复

要查看或添加评论,请登录

Himanshu T.的更多文章

社区洞察

其他会员也浏览了