Scenarios: Which Machine Learning (ML) to choose?
Scenarios: Which Machine Learning (ML) to choose?
Based on the inspiration from “Which chart to choose?" [1], which helps you to choose the right chart for your data, we developed the idea to chart “Which Machine Learning (ML) to choose?”
Before we present the flowchart of “Which Machine Learning (ML) to choose?” as part of the "Architectural Blueprints—The “4+1” View Model of Machine Learning," let us take a look at the big picture and zoom in on the steps that this flowchart can guide you in the selection of a machine learning to solve a business problem.
Solving a problem and finding its solution, you can follow these steps:
Quality of your data, a good data quality, is a necessary prerequisite to building your accurate ML model. [Data Science Approaches to Data Quality: From Raw Data to Datasets]
Processing pipeline should include at least the following stages:
a.)???Data preprocessing and preparation
b.)???Datasets sampling for training and validation
c.)????Model training, validation, and evaluation
d.)???Predication model deployment
e.)???Production model monitor, feedback, and retrain
Which Machine Learning (ML) to choose? Chart: Visual Science Informatics, LLC
Selecting a logical learning paradigm or a computational method has four primary categories, four major algorithm types, and two major techniques. The four major categories are supervised, semi-supervised, unsupervised, and reinforcement. The four major algorithm types are classification, regression, associations, and clustering. The two techniques are ensemble methods and reward feedback. The chart above, “Which Machine Learning (ML) to choose?” guides you through the major categories, data types, and objectives of which algorithm types or techniques to choose.
Choosing the right machine learning (ML) approach depends on various factors related to the problem you are trying to solve, the nature of your data, and the goals of your project. Here are some common scenarios and the types of ML techniques that might be suitable for each:
Predicting a Continuous Value
Regression is a machine learning task where the goal is to predict a continuous numerical value. This is in contrast to classification, where the goal is to predict a categorical label. Types of Regression:
2. Polynomial Regression:
3. Logistic Regression:
4. Support Vector Regression (SVR):
5. Decision Trees and Random Forests:
Classifying Data into Categories
Classification is a machine learning task where the goal is to predict a categorical label or class for a given input. There are two main types of classification: binary and multi-class.
Binary Classification
Multi-Class Classification
One-vs.-All Classification
1. Train a binary classifier for each class, treating that class as positive and the rest as negative.
2. For a new input, predict the class with the highest probability from all the binary classifiers.
Softmax (One-vs.-One (Rest) Classification)
1. Apply a softmax activation function to the output layer of the neural network.
2. The softmax function converts the raw outputs into probabilities that sum to 1.
3. The class with the highest probability is predicted.
softmax equation. Google
In summary, binary classification deals with two classes, while multi-class classification handles more than two. One-vs.-All and softmax are common approaches to tackle multi-class classification problems.
Clustering Data into Groups
Clustering is an unsupervised machine learning technique used to group similar data points together. It is a powerful tool for discovering patterns and relationships within data that might not be immediately apparent. Types of Clustering Algorithms:
2. Hierarchical Clustering:
3. Density-Based Clustering:
4. Distribution-Based Clustering:
Choosing the right clustering algorithm
The best clustering algorithm depends on the specific characteristics of the data and the desired outcome. Consider factors such as:
By carefully considering these factors, you can select the appropriate clustering algorithm for your specific application.
In each scenario, you will also need to consider factors such as data availability, interpretability, computational resources, and model complexity. It often helps to experiment with multiple approaches and evaluate them based on performance metrics relevant to your specific problem.
Machine learning exploration workflow. Diagram: Google
Understanding a model's problem-solving capabilities, process, inputs, and outputs is essential before selecting your ML model.?An applicable machine learning model depends on your problem and objectives. Machine learning approaches are deployed where it is highly complex or unfeasible to develop conventional algorithms to perform needed tasks or solve problems. Machine learning models are utilized in many domains, such as advertising, agriculture, communication, computer vision, customer services, finance, gaming, investing, marketing, medicine, robotics, security, visualization, and weather.
Range of Business/Machine Learning Algorithms. Mind map: GEEKSFORGEEKS
Choosing an applicable metric for evaluating machine learning models depends on the problem and objectives. From a business perspective, two of the most significant measurements are accuracy and interpretability. Accuracy degree measures – how reliable is the conclusion while interpretability (reasoning) measures – how well the model enables understanding of the justification and reasoning for the decision conclusion.
Evaluating the accuracy of a machine learning model is critical in selecting and deploying a machine learning model. Choosing the right accuracy metric for evaluating your machine learning model depends on your problem solution objectives and datasets. Before choosing one, it is important to understand the business problem context, the pros and cons, and the usefulness of each error metric.
Chart by?Alvira Swalin?via?“Choosing the Right Metric for Evaluating Machine Learning Models — Part 1" [2], & Choosing the Right Metric for Evaluating Machine Learning Models — Part 2" [3]
The chart above captures and categorizes useful metrics for evaluating machine learning models for a variety of machine learning algorithms, computational methods, and techniques.?
Measuring, for instance, a binary output prediction (Classification) is captured in a specific table layout - a Confusion Matrix, which visualizes whether a model is confusing two classes. Each row of the matrix represents the instances in an actual class, while each column represents the instances in a predicted class. Four measures are captured: True Positive, False Negative, False Positive, and True Negative.
Calculating accuracy is derived from the four values in a confusion matrix. Additional metrics with formulas on the right and below are Classification Evaluation Metrics. These metrics include but are not limited to the following: Sensitivity, Specificity, Accuracy, Negative Predictive Value, and Precision.
Confusion Matrix and Classification Evaluation Metrics. Table:?Maninder Virk
Building an accurate classification model can correctly classify positives from negatives.
On the other hand, measuring interpretability (reasoning) is a more complex task because there is neither a universally agreeable definition nor an objective quantitative measure. In general, opaque computational methods obtain higher accuracies than transparent ones. There are computational methods that produce an interpretable predictive model such as a post hoc interpretable model or an?intrinsically interpretable?algorithm. One measure of interpretability based on “triptych predictivity, stability, and simplicity” is proposed by Vincent Margot in “How to measure interpretability?" [4], and [Interpretability/Explainability: “Seeing Machines Learn”]
Chart by?Sharayu Rane?via?“The balance: Accuracy vs. Interpretability" [5]
The chart “The balance: Accuracy vs. Interpretability” sorts out the trade-off between accuracy and interpretability (reasoning) for a variety of machine learning algorithms, computational methods, and techniques. [Accuracy: The Bias-Variance Trade-off]
Overall, selecting a machine learning technique depends on your problem, objectives, and data. As we mentioned above, there are four major categories, four major algorithm types, and two major techniques. The chart at the top “Which Machine Learning (ML) to choose?” guides you through the major categories, data types, and objectives of which algorithm types or techniques to choose. The chart below extends to additional horizontal ML techniques such as attribute and row importance, feature extraction, and anomaly detection.
Machine Learning Techniques. Chart: Data Science School
Ensemble methods are powerful techniques in machine learning that combine multiple models to improve predictive performance. By harnessing the strengths of diverse models, ensembles can often outperform individual models.
Ensemble Methods. Diagrams: Neri Van Otten
Bagging (Bootstrap Aggregating)
1. Create multiple subsets of the training data through bootstrapping.
2. Train a base model on each subset.
3. Combine predictions from all models, often by averaging or voting.
Boosting
1. Train a base model on the entire dataset.
2. Assign weights to data points based on their classification accuracy.
3. Train subsequent models, giving more weight to misclassified data points.
4. Combine predictions using weighted voting.
Stacking (Stacked Generalization)
1. Train multiple base models on the training data.
2. Use the base models to make predictions on a holdout set.
3. Use the predictions from the base models as features for a meta-model.
4. Train the meta-model to make final predictions.
Cascading
Key Difference of Ensemble Methods. Table: Gemini
Choosing the right ensemble method depends on the specific problem, dataset, and desired performance.
Ensemble Methods Comparison. Table: Neri Van Otten
If you have multiple ML models with similar accuracy, precision, recall, or other metrics, you can create a majority vote classifier, weighted voting, or stacking ensemble with a meta-model that utilizes all models, especially when the models are different in nature. Combining multiple high-performing models through ensemble techniques is a powerful strategy to enhance predictive accuracy and robustness.
Reinforcement Learning (RL) offers various approaches to solve problems where an agent learns to make decisions by interacting with an environment. The primary computational methods can be categorized into:
Reinforcement Learning (RL) Agent Taxonomy. Diagram adapted: Pratap Dangeti
Model-Based Reinforcement Learning
领英推荐
1. Learn a transition model: P(s'|s, a)
2. Learn a reward model: R(s, a)
3. Use planning algorithms (e.g., dynamic programming, search) to find optimal actions based on the learned model.
Policy-Based Reinforcement Learning
Value-Based Reinforcement Learning
Actor-Critic Reinforcement Learning
- Actor: Learns a policy using policy gradients.
- Critic: Learns a value function to estimate the expected return.
- The actor improves its policy based on the critic's evaluation.
Key Differences of RL Primary Computational Methods. Table: Gemini
Choosing the right computational method depends on the specific problem and environment.
"Time series is a ML technique that forecasts target value based solely on a known history of target values. It is a specialized form of regression known in the literature as auto-regressive modeling. The input to time series analysis is a sequence of target values." [Oracle]
Time Series Components. Chart: Nirmal Gaud
Time Series Forecasting
Time series analysis comprises methods for analyzing time series data to extract meaningful statistics and data predictors characteristics. Time series regression, autoregressive dynamics, is a statistical method for predicting a future response based on the response history.
Categorized ML Algorithms. Mind map: Gina Acosta Gutiérrez
After choosing your ML scenario, your next step is to choose your ML algorithm. To choose your ML algorithm, you can utilize the categorized ML algorithms diagram, which is a partial list of ML and data mining algorithms that are organized in a hierarchical tree diagram of ML algorithms categories.
Your data type is a critical success factor when selecting your ML algorithm. For example, tree-based models outperform deep learning on typical tabular data. An experimental in-depth analysis of ML algorithms on tabular datasets with both categorical and numerical features, by Léo Grinsztajn et al., provided empirical results and insights into the reasons:
"1. Neural networks are biased to overly smooth solutions
2. Neural networks are more impacted by uninformative features
3. Data is non-invariant by rotation, so should be learning procedures"
Benchmark on medium-sized datasets. Graphs: Léo Grinsztajn et al.
Also, on the one hand, deep learning models are notorious for hyperparameter optimization. On the other hand, tree-based models (e.g., XGBoost) are simpler algorithms, easier to tune, and the best performer on tabular data.?
At a higher level, they are six archetypical analysis methods, Descriptive, Exploratory, Interference, Predictive, Prescriptive, and Causality. These analysis methods are defined as:
Six Archetypical Analyses. Chart: Visual Science Informatics, LLC
Each archetypical analysis method aims to answer different questions. The higher the complexity of the analyses (in terms of knowledge, cost, and time), the more valuable the answer output of the analytic method. [Complexity: Time, Space, & Sample]
The Value of Analytics Methods. Chart: Visual Science Informatics, LLC
Learning goals and objectives are significant to establish. Organizing objectives helps to clarify objectives.
"Bloom's taxonomy is a set of three hierarchical models used for the classification of educational learning objectives into levels of complexity and specificity. The three lists cover the learning objectives in the cognitive, affective, and psychomotor domains.
Bloom's Revised Taxonomy. Diagram: Jessica Shabatura, UARK
There are six levels of cognitive learning according to the revised version of Bloom's Taxonomy. Each level is conceptually different. The six levels are?remembering, understanding, applying, analyzing, evaluating, and creating. The new terms are defined as:
This Bloom's taxonomy was adapted for machine learning.
Bloom’s Taxonomy Adapted for Machine Learning (ML). Chart: Visual Science Informatics, LLC
There are six levels of model learning in the adapted version of Bloom's Taxonomy for ML. Each level is a conceptually different learning model. The levels order is from lower-order learning to higher-order learning. The six levels are?Store, Sort, Search, Descriptive, Discriminative,?and?Generative. Bloom’s Taxonomy adapted for ML terms are defined as:
Conditional Generative Adversarial Network Model Architecture Example. Diagram: Jason Brownlee
Another decision point in choosing a machine learning model is the difference between a discriminative vs. a generative model. A discriminative approach focuses on a solution and performs better for classification tasks by dividing the data space into classes by learning the boundaries. A generative approach models understand how data is embedded throughout space and generates new data points.
Discriminative vs. Generative. Table: Supervised Learning Cheatsheet
A Neural Network (NN) is a series of algorithms inspired by the structure and function of the human brain. Neural networks are used for a variety of tasks, including image recognition, speech recognition, and natural language processing.
Neural Networks have high predictive power, but have low interpretability because the nature of neural networks is a black box where the inner working of deep networks is not fully explainable.
“An artificial neuron simply hosts the mathematical computations. Like our neurons, it triggers when it encounters sufficient stimuli. The neuron combines input from the data with a set of coefficients, or weights, which either amplify or dampen that input, which thereby assigns significance to inputs for the task the algorithm is trying to learn". [Anddy Cabrera]
Neural Networks learn by adjusting the weights of the connections between neurons. The weights determine how much influence one neuron has on another. By adjusting the weights, a neural network can learn to perform a specific task.
Neural Networks' Architectures: ANN, RNN, LSTM & CNN. Diagrams: A. Catherine Cabrera, and B. InterviewBit
"Neural Network Standard Components:
Backpropagation is a fundamental algorithm used to train artificial neural networks. It is essentially a computational method for calculating the gradient of the error function with respect to the network's weights. In simpler terms, it helps the network learn from its mistakes by adjusting its parameters to minimize the error between its predicted output and the actual output.
Different neural networks have distinct architectures tailored to their functions and strengths. Here are description of major neural networks' architectures:
Here is a table summarizing the key differences:
Key Differences of Neural Networks' Architectures. Table: Gemini
Deep Neural Networks (DNNs) are trained using large sets of labeled or unlabeled data and increasingly learn abstract features directly from the data without manual feature extraction. Traditional neural networks may contain around 2-3 hidden layers, while deep networks can have as many as 100-200 hidden layers.
The Neural Network Zoo. Node Maps: Van Veen, F. & Leijnen, S. (2019). The Asimov Institute
A Generative Adversarial Network (GAN) is a type of deep learning system that uses two neural networks to compete against each other. Here is a breakdown of how it works:
- Generator: This network creates new data, such as images or music, based on the data it is been trained on.
- Discriminator: This network tries to tell the difference between the new data created by the generator and real data from the training set.
Deep Belief Networks (DBNs) are a type of deep learning architecture used for unsupervised learning tasks. They can be thought of as building blocks for more complex neural networks. Here is a breakdown of how they work:
- Building Blocks: Restricted Boltzmann Machines (RBMs)
- Stacked for Learning: The Deep Belief Network
- Unsupervised Training
- Applications of Deep Belief Networks
- Some limitations of DBNs include:
Overall, Deep Belief Networks are a powerful tool for unsupervised feature learning and can be a valuable component in building more complex deep learning architectures.
There are numerous more special networks, layers, and operations such as transformers, latent diffusion models, inception, features pyramid networks, etc.
Note that Node Maps have limitations in portraying the nuances of deep learning models. There are numerous differences in the usage scenarios, restrictions, mitigations (decaying, vanishing, and exploding information), and scalabilities.?Additional functionality could be?to preprocess, encode, or decode?information, parallel competitive learning/predicting/generating, or un-black-box.?Also, the differences are in inputs: data, feedback, and noise, connectivity: past, present, future, random, reversed, stacked, and extra, and states: activations, triggers, stateless, memory, probabilistic, and pooling multiple weights as a vector.
Activation Function in NN
An Artificial Neuron in Action. Animation: Anddy Cabrera
Activation functions play a crucial role in neural networks by introducing non-linearity, enabling them to learn complex patterns. The evolution of activation functions has been closely tied to the development of neural network architectures and training algorithms. Early activation functions were Sigmoid and Tanh, but since then there are improved activation functions, which are listed in the table below.
Activation Function Comparison. Table: Gemini
Note: This table provides a brief overview of common activation functions. The choice of activation function often depends on the specific task and architecture of the neural network.
Factors driving evolution:
For example, ReLU is a common choice for many deep learning architectures, and helps to alleviate the vanishing gradient problem in deep networks, leading to improved performance. While more specialized functions such as GELU and Swish may be better suited for certain tasks such as translation, text classification, and natural language processing.
However, it is important to note that other activation functions and their variants, can also perform well in many cases. The best activation function for a given task may need to be determined through experimentation and evaluation.
If you are considering using a specific activation function in your own projects, it is recommended to try it alongside other activation functions and evaluate its performance on your specific dataset.
Best practices for neural network training:
Additionally, ML Operations (MLOps) and Continuous ML (CML) are a set of practices that aims to deploy and maintain machine learning models in production reliably and efficiently. [Operations: MLOps, Continuous ML, & AutoML]
ML Algorithms Cheat Sheet. Diagram: SSAS
In conclusion, choosing a Machine Learning (ML) depends on multiple complex factors and challenging trade-offs. You will need to consider at least four competing architectural factors: Accuracy, Complexity, Interpretability/Explainability, and Operations.?Selecting machine learning, which balances all decision factors, is important. Because the capital investment, in the processing pipeline stages, is costly and requires considerable time and effort. Therefore, it is highly valuable to employ a rigorous process in choosing machine learning. [27]
Next, read my "Accuracy: The Bias-Variance Trade-off" article at?https://www.dhirubhai.net/pulse/accuracy-bias-variance-tradeoff-yair-rajwan-ms-dsc.
---------------------------------------------------------
| BSN | RN | Founder | Y-Visionary |
2 年I love your articles! Thank you for sharing ??
Crafting Scalable Data Pipelines | Data Engineer @Datain | Ex Taager | ETL | Snowflake | Airflow | DBT
2 年Great article!
Read the "Accuracy: The Bias-Variance Tradeoff" article at https://www.dhirubhai.net/pulse/accuracy-bias-variance-tradeoff-yair-rajwan-ms-dsc
Great post. Thank you for sharing