PH JILI Casino Login no deposit bonus,Coinbet casino no deposit bonus.REGISTER NOW GET FREE 888 PESOS REWARDS!

Scenarios: Which Machine Learning (ML) to choose?

Based on the inspiration from “Which chart to choose?" [1], which helps you to choose the right chart for your data, we developed the idea to chart “Which Machine Learning (ML) to choose?”

Before we present the flowchart of “Which Machine Learning (ML) to choose?” as part of the "Architectural Blueprints—The “4+1” View Model of Machine Learning," let us take a look at the big picture and zoom in on the steps that this flowchart can guide you in the selection of a machine learning to solve a business problem.

ML Architectural Blueprints = {Scenarios, Accuracy, Complexity, Interpretability, Operations}

Solving a problem and finding its solution, you can follow these steps:

Your strategy can be to select Artificial Intelligent (AI) as your conceptual framework.
One of the viable approaches within AI is Machine Learning (ML).
After formulating the problem and exploring feasible data acquisition, part of your methodology is choosing a logical learning paradigm.
Then you can identify the available data type and define an objective. The logical learning paradigm, data type, and objectives are the criteria for selecting a physical learning method.
The next step is to follow a workflow procedure.
This workflow procedure can be customized for specific techniques.
Finally, you can select a machine learning algorithm.

Quality of your data, a good data quality, is a necessary prerequisite to building your accurate ML model. [Data Science Approaches to Data Quality: From Raw Data to Datasets]

Processing pipeline should include at least the following stages:

a.)???Data preprocessing and preparation

b.)???Datasets sampling for training and validation

c.)????Model training, validation, and evaluation

d.)???Predication model deployment

e.)???Production model monitor, feedback, and retrain

Which Machine Learning (ML) to choose? Chart: Visual Science Informatics, LLC

Selecting a logical learning paradigm or a computational method has four primary categories, four major algorithm types, and two major techniques. The four major categories are supervised, semi-supervised, unsupervised, and reinforcement. The four major algorithm types are classification, regression, associations, and clustering. The two techniques are ensemble methods and reward feedback. The chart above, “Which Machine Learning (ML) to choose?” guides you through the major categories, data types, and objectives of which algorithm types or techniques to choose.

Choosing the right machine learning (ML) approach depends on various factors related to the problem you are trying to solve, the nature of your data, and the goals of your project. Here are some common scenarios and the types of ML techniques that might be suitable for each:

Predicting a Continuous Value

Regression is a machine learning task where the goal is to predict a continuous numerical value. This is in contrast to classification, where the goal is to predict a categorical label. Types of Regression:

1. Linear Regression:

A simple model that assumes a linear relationship between the independent variables and the dependent variable.
Used for predicting numerical values such as house prices, stock prices, or sales figures.

2. Polynomial Regression:

Extends linear regression to model non-linear relationships by fitting a polynomial curve to the data.
Used for predicting values that have a non-linear relationship with the independent variables.

3. Logistic Regression:

While often used for binary classification, logistic regression can also be used for predicting continuous values that are bounded between 0 and 1, such as probabilities or proportions.

4. Support Vector Regression (SVR):

A regression technique that uses support vectors to create a linear or non-linear regression model.
Used for predicting continuous values with outliers or noise in the data.

5. Decision Trees and Random Forests:

Can be used for both classification and regression.
Decision trees create a series of if-else questions to predict a continuous value.
Random forests combine multiple decision trees to improve accuracy and reduce overfitting.

Classifying Data into Categories

Classification is a machine learning task where the goal is to predict a categorical label or class for a given input. There are two main types of classification: binary and multi-class.

Binary Classification

Definition: In binary classification, the model predicts only one of two possible classes.
Scenarios: Spam detection (spam or not spam), Sentiment analysis (positive or negative), Credit card fraud detection (fraudulent or legitimate)
Example: Support Vector Machines (SVMs)

- Support Vector Machines (SVMs) are a powerful supervised learning algorithm used for both classification and regression tasks. In classification, SVMs aim to find an optimal hyperplane that effectively separates data points into different classes in a high-dimensional space. This hyperplane is determined by maximizing the margin, which is the distance between the hyperplane and the closest data points of each class, known as support vectors. SVMs can handle both linear and nonlinearly separable data through the use of kernel functions, which map the data into a higher-dimensional space where linear separation becomes possible. This flexibility makes SVMs highly effective in various applications, including image recognition, text classification, and bioinformatics.

Multi-Class Classification

Definition: In multi-class classification, the model predicts one of more than two possible classes.
Scenarios: Image classification (cat, dog, bird, etc.), Genre classification (rock, pop, jazz, etc.), Language identification
Example: Naive Bayes

- Naive Bayes is a simple yet powerful probabilistic classification algorithm based on Bayes' theorem, assuming independence between features given the class label. Naive Bayes is capable of both binary and multi-class classification. It can handle datasets with two or more class labels.

One-vs-One (OvO) Classification

Method: This strategy involves training a binary classifier for each pair of classes. For a problem with N classes, N*(N-1)/2 classifiers are trained.
Process:

1. Train a binary classifier for each pair of classes.

2. For a new input, each classifier predicts the more likely class of the pair.

3. The class with the most wins is predicted as the final class.

One-vs-All (OvA) / One-vs-Rest (OvR) Classification

Method: A strategy for handling multi-class classification problems by training a binary classifier for each class. Each classifier distinguishes that class from all the others. OvA/OvR will independently predict a probability for each possible class.
Process:

1. Train a binary classifier for each class, treating that class as positive and the rest as negative.

2. For a new input, predict the class with the highest probability from all the binary classifiers.

In summary, binary classification deals with two classes, while multi-class classification handles more than two.

Clustering Data into Groups

Clustering is an unsupervised machine learning technique used to group similar data points together. It is a powerful tool for discovering patterns and relationships within data that might not be immediately apparent. Types of Clustering Algorithms:

1. Partitioning Clustering:

Divides the dataset into predefined clusters.
K-means: One of the most popular algorithms, it divides data into K clusters based on the distance to cluster centroids.
K-medoids: Similar to K-means, but uses actual data points as cluster centers instead of centroids.
Fuzzy c-means: Allows data points to belong to multiple clusters with varying degrees of membership.

2. Hierarchical Clustering:

Creates a hierarchy of clusters, starting with individual data points and merging them into larger clusters.
Agglomerative: Starts with each data point as a separate cluster and merges them based on similarity.
Divisive: Starts with one large cluster and divides it into smaller clusters.

3. Density-Based Clustering:

Identifies clusters based on the density of data points in a region.
DBSCAN: Identifies clusters based on the density of data points in a region.
OPTICS: Similar to DBSCAN but provides an ordering of data points based on their density.

4. Distribution-Based Clustering:

Assumes that the data points belong to different probability distributions.
Gaussian Mixture Models: Assumes that the data is generated from a mixture of Gaussian distributions.

Choosing the right clustering algorithm

The best clustering algorithm depends on the specific characteristics of the data and the desired outcome. Consider factors such as:

Shape of clusters: Some algorithms are better suited for spherical clusters (e.g., K-means), while others can handle more complex shapes (e.g., DBSCAN).
Number of clusters: If you know the approximate number of clusters, partitioning algorithms such as K-means might be suitable.
Noise: If the data contains noise or outliers, density-based algorithms such as DBSCAN can be effective.
Computational efficiency: For large datasets, algorithms such as K-means might be more efficient than hierarchical clustering.

By carefully considering these factors, you can select the appropriate clustering algorithm for your specific application.

In each scenario, you will also need to consider factors such as data availability, interpretability, computational resources, and model complexity. It often helps to experiment with multiple approaches and evaluate them based on performance metrics relevant to your specific problem.

ML Exploration Workflow

Machine Learning Exploration Workflow. Diagram: Google

Understanding a model's problem-solving capabilities, process, inputs, and outputs is essential before selecting your ML model.?An applicable machine learning model depends on your problem and objectives. Machine learning approaches are deployed where it is highly complex or unfeasible to develop conventional algorithms to perform needed tasks or solve problems. Machine learning models are utilized in many domains, such as advertising, agriculture, communication, computer vision, customer services, finance, gaming, investing, marketing, medicine, robotics, security, visualization, and weather.

Range of Business/Machine Learning Algorithms. Mind map: GEEKSFORGEEKS

Choosing an applicable metric for evaluating machine learning models depends on the problem and objectives. From a business perspective, two of the most significant measurements are accuracy and interpretability. Accuracy degree measures – how reliable is the conclusion while interpretability (reasoning) measures – how well the model enables understanding of the justification and reasoning for the decision conclusion.

Evaluating the accuracy of a machine learning model is critical in selecting and deploying a machine learning model. Choosing the right accuracy metric for evaluating your machine learning model depends on your problem solution objectives and datasets. Before choosing one, it is important to understand the business problem context, the pros and cons, and the usefulness of each error metric.

Chart by?Alvira Swalin?via?“Choosing the Right Metric for Evaluating Machine Learning Models — Part 1" [2], & Choosing the Right Metric for Evaluating Machine Learning Models — Part 2" [3]

The chart above captures and categorizes useful metrics for evaluating machine learning models for a variety of machine learning algorithms, computational methods, and techniques.?

Measuring, for instance, a binary output prediction (Classification) is captured in a specific table layout - a Confusion Matrix, which visualizes whether a model is confusing two classes. Each row of the matrix represents the instances in an actual class, while each column represents the instances in a predicted class. Four measures are captured: True Positive, False Negative, False Positive, and True Negative.

Calculating accuracy is derived from the four values in a confusion matrix. Additional metrics with formulas on the right and below are Classification Evaluation Metrics. These metrics include but are not limited to the following: Sensitivity, Specificity, Accuracy, Negative Predictive Value, and Precision.

Confusion Matrix and Classification Evaluation Metrics. Table:?Maninder Virk

Building an accurate classification model can correctly classify positives from negatives.

On the other hand, measuring interpretability (reasoning) is a more complex task because there is neither a universally agreeable definition nor an objective quantitative measure. In general, opaque computational methods obtain higher accuracies than transparent ones. There are computational methods that produce an interpretable predictive model such as a post hoc interpretable model or an?intrinsically interpretable?algorithm. One measure of interpretability based on “triptych predictivity, stability, and simplicity” is proposed by Vincent Margot in “How to measure interpretability?" [4], and [Interpretability: “Seeing Machines Learn”]

Chart by?Sharayu Rane?via?“The balance: Accuracy vs. Interpretability" [5]

The chart “The balance: Accuracy vs. Interpretability” sorts out the trade-off between accuracy and interpretability (reasoning) for a variety of machine learning algorithms, computational methods, and techniques. [Accuracy: The Bias-Variance Trade-off]

Overall, selecting a machine learning technique depends on your problem, objectives, and data. As we mentioned above, there are four major categories, four major algorithm types, and two major techniques. The chart at the top “Which Machine Learning (ML) to choose?” guides you through the major categories, data types, and objectives of which algorithm types or techniques to choose. The chart below extends to additional horizontal ML techniques such as attribute and row importance, feature extraction, and anomaly detection.

Machine Learning Techniques. Chart: Data Science School

Ensemble Methods

Ensemble methods are powerful techniques in machine learning that combine multiple models to improve predictive performance. By harnessing the strengths of diverse models, ensembles can often outperform individual models.

Ensemble Methods. Diagrams: Neri Van Otten

Bagging (Bootstrap Aggregating)

Concept: Trains multiple models on different subsets of the training data created by random sampling with replacement.
Process:

1. Create multiple subsets of the training data through bootstrapping.

2. Train a base model on each subset.

3. Combine predictions from all models, often by averaging or voting.

Advantages: Reduces variance, improves stability, and can handle both classification and regression problems.
Example: Random Forest

- Random Forest: A powerful ML algorithm that combines the output of multiple decision trees and mitigates the drawbacks of individual decision trees to make accurate predictions. It works by creating a multitude of decision trees during training, each trained on a random subset of the data and features. By combining multiple trees, Random Forest effectively balances between bias and variance resulting in models that are both accurate and robust. This ensemble approach mitigates underfitting, reduces overfitting, and improves generalization performance. Random Forest is versatile and can handle both classification and regression problems, making it a popular choice for various applications.

Boosting

Concept: Sequentially trains multiple models, where each model focuses on correcting the errors of its predecessors.
Process:

1. Train a base model on the entire dataset.

2. Assign weights to data points based on their classification accuracy.

3. Train subsequent models, giving more weight to misclassified data points.

4. Combine predictions using weighted voting.

Advantages: Typically achieves high accuracy, can handle complex patterns, and is effective for both classification and regression.
Examples: AdaBoost, CatBoost Gradient Boosting, and XGBoost

- AdaBoost (Adaptive Boosting): An ensemble learning method that combines multiple weak learners (e.g., decision trees) to create a strong classifier. It focuses on misclassified samples, assigning higher weights to them in subsequent iterations. Prone to overfitting if not carefully tuned, especially with a large number of iterations or weak learners. Generally reduces bias by sequentially adding weak learners to correct errors, but can potentially increase variance if not tuned properly.

- CatBoost (Categorical Boosting): A gradient boosting algorithm specifically designed to handle categorical features effectively. It uses a novel technique called "ordered boosting" to improve accuracy and efficiency. Generally robust to overfitting due to its regularization techniques and categorical feature handling. Reduces both bias and variance due to its effective handling of categorical features and regularization techniques.

- Gradient Boosting: A general machine learning ensemble method that produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees. Primarily reduces bias by sequentially adding weak learners to minimize a loss function. Can overfit if the model complexity is too high or the number of iterations is excessive.

- XGBoost (Extreme Gradient Boosting): An optimized distributed gradient boosting library that is highly efficient, flexible, and portable. It implements machine learning algorithms under the Gradient Boosting framework, often outperforming other boosting algorithms in terms of accuracy and speed. Reduces both bias and variance through regularization techniques, optimization algorithms, and careful model tuning. Less prone to overfitting due to its regularization techniques, early stopping, and efficient optimization algorithms. However, it can still overfit if not tuned properly.

Stacking (Stacked Generalization)

Concept: Trains a meta-model to combine the predictions of multiple base models.
Process:

1. Train multiple base models on the training data.

2. Use the base models to make predictions on a holdout set.

3. Use the predictions from the base models as features for a meta-model.

4. Train the meta-model to make final predictions.

Advantages: Can leverage the strengths of different base models, often achieving higher accuracy than individual models.
Challenges: Can be computationally expensive and requires careful tuning of base models and meta-model.

Meta-Learner: The optimal choice of a meta-learner depends on the specific problem, the complexity of the relationships between the base models' predictions, and the desired level of interpretability. Experimentation with different meta-learners, such as linear or logistic regression, decision trees, neural networks, or Support Vector Machines (SVMs), is often necessary to find the best performing model.

Cascading

Concept: Organizes models in a hierarchical structure, where the output of one model is used as input to the next.
Process:

Train a base model to filter out easy instances.
Pass difficult instances to the next model in the cascade.
Continue building layers of models until desired performance is achieved.

Advantages: Can improve efficiency by focusing computational resources on difficult instances.
Challenges: Requires careful design of the cascade structure and can be sensitive to the performance of early-stage models.

Key Difference of Ensemble Methods. Table: Gemini

Choosing the right ensemble method depends on the specific problem, dataset, and desired performance.

Ensemble Methods Comparison. Table: Neri Van Otten

If you have multiple ML models with similar accuracy, precision, recall, or other metrics, you can create a majority vote classifier, weighted voting, or stacking ensemble with a meta-model that utilizes all models, especially when the models are different in nature. Combining multiple high-performing models through ensemble techniques is a powerful strategy to enhance predictive accuracy and robustness.

Reinforcement Learning (RL)

Reinforcement Learning (RL) offers various approaches to solve problems where an agent learns to make decisions by interacting with an environment. The primary computational methods can be categorized into:

Reinforcement Learning (RL) Agent Taxonomy. Diagram adapted: Pratap Dangeti

Model-Based Reinforcement Learning

Concept: The agent learns a model of the environment, predicting how the environment will respond to different actions. This model is then used to plan optimal actions.
Process:

1. Learn a transition model: P(s'|s, a)

2. Learn a reward model: R(s, a)

3. Use planning algorithms (e.g., dynamic programming, search) to find optimal actions based on the learned model.

Advantages: Can be efficient in environments with well-structured dynamics.
Disadvantages: Relies on accurate model learning, which can be challenging in complex environments.

Policy-Based Reinforcement Learning

Concept: The agent directly learns a policy, which maps states to actions.
Process: The agent learns to improve its policy by adjusting parameters based on the received rewards.
Advantages: Can represent complex stochastic policies and often converges to locally optimal solutions.
Disadvantages: Can be less sample-efficient than value-based methods and might get stuck in local optima.

Value-Based Reinforcement Learning

Concept: The agent learns a value function, which estimates the expected return from a given state or state-action pair.
Process: The agent selects actions based on the estimated values and updates the value function based on observed rewards.
Advantages: Often sample-efficient and can find globally optimal solutions.
Disadvantages: Difficulty in representing complex stochastic policies.

Actor-Critic Reinforcement Learning

Concept: Combines the strengths of policy-based and value-based methods.
Process:

- Actor: Learns a policy using policy gradients.

- Critic: Learns a value function to estimate the expected return.

- The actor improves its policy based on the critic's evaluation.

Advantages: Combines the exploration benefits of policy-based methods with the stability of value-based methods.
Disadvantages: Can be more complex to implement than pure policy-based or value-based methods.

Key Differences of RL Primary Computational Methods. Table: Gemini

Choosing the right computational method depends on the specific problem and environment.

Model-based: Suitable for environments with well-defined dynamics and where building a model is feasible.
Policy-based: Good for complex environments with continuous action spaces or stochastic policies.
Value-based: Effective in simpler environments with discrete action spaces and where sample efficiency is crucial.
Actor-Critic: Versatile approach that can handle various environments and often provides good performance.

Time Series Components. Chart: Nirmal Gaud

"Time series is a ML technique that forecasts target value based solely on a known history of target values. It is a specialized form of regression known in the literature as auto-regressive modeling. The input to time series analysis is a sequence of target values." [Oracle]

Time Series Forecasting

Scenario: You need to forecast future sales based on historical data.

ARIMA (Auto Regressive Integrated Moving Average): Traditional statistical method for time series forecasting.
Exponential Smoothing (ETS): Useful for data with trends and seasonality.
Recurrent Neural Networks (RNN): Especially Long Short-Term Memory (LSTM) networks for capturing long-term dependencies in time series.
Prophet: Developed by Facebook, good for handling seasonality and holidays.

Time series analysis comprises methods for analyzing time series data to extract meaningful statistics and data predictors characteristics. Time series regression, autoregressive dynamics, is a statistical method for predicting a future response based on the response history.

Categorized ML Algorithms. Mind map: Gina Acosta Gutiérrez

After choosing your ML scenario, your next step is to choose your ML algorithm. To choose your ML algorithm, you can utilize the categorized ML algorithms diagram, which is a partial list of ML and data mining algorithms that are organized in a hierarchical tree diagram of ML algorithms categories.

Your data type is a critical success factor when selecting your ML algorithm. For example, tree-based models outperform deep learning on typical tabular data. An experimental in-depth analysis of ML algorithms on tabular datasets with both categorical and numerical features, by Léo Grinsztajn et al., provided empirical results and insights into the reasons:

"1. Neural networks are biased to overly smooth solutions

2. Neural networks are more impacted by uninformative features

3. Data is non-invariant by rotation, so should be learning procedures"

Benchmark on medium-sized datasets. Graphs: Léo Grinsztajn et al.

Also, on the one hand, deep learning models are notorious for hyperparameter optimization. On the other hand, tree-based models (e.g., XGBoost) are simpler algorithms, easier to tune, and the best performer on tabular data.?

At a higher level, they are six archetypical analysis methods, Descriptive, Exploratory, Interference, Predictive, Prescriptive, and Causality. These analysis methods are defined as:

Six Archetypical Analyses. Chart: Visual Science Informatics, LLC

Descriptive statistics is the discipline of quantitatively describing the main features of a data collection.
Exploratory Data Analysis (EDA) is an approach to analyzing data sets to summarize their main characteristics in an easy-to-understand form, often with visual graphs and dynamic visualization capabilities,?without using a statistical model or having formulated a hypothesis.
Inference is the process of drawing conclusions from data that is subject to random variation.
Predictive analytics analyzes historical facts to make forecasting future trends, behavior patterns, or unknown events.?
Prescriptive analytics synthesizes big data, past performance, mathematical sciences, business rules, and machine learning to suggest decision options to take advantage of a probable future outcome of an event or a likelihood of a situation occurring.
Causality (causation) is the relationship between an event or a set of factors (the cause) and a second event or phenomenon (the effect), where the second event is understood as a consequence of the first.

Each archetypical analysis method aims to answer different questions. The higher the complexity of the analyses (in terms of knowledge, cost, and time), the more valuable the answer output of the analytic method. [Complexity: Time, Space, & Sample]

The Value of Analytics Methods. Chart: Visual Science Informatics, LLC

Learning goals and objectives are significant to establish. Organizing objectives helps to clarify objectives.

"Bloom's taxonomy is a set of three hierarchical models used for the classification of educational learning objectives into levels of complexity and specificity. The three lists cover the learning objectives in the cognitive, affective, and psychomotor domains.

Bloom's Revised Taxonomy. Diagram: Jessica Shabatura, UARK

There are six levels of cognitive learning according to the revised version of Bloom's Taxonomy. Each level is conceptually different. The six levels are?remembering, understanding, applying, analyzing, evaluating, and creating. The new terms are defined as:

Remembering: Retrieving, recognizing, and recalling relevant knowledge from long-term memory.
Understanding: Constructing meaning from oral, written, and graphic messages through interpreting, exemplifying, classifying, summarizing, inferring, comparing, and explaining.
Applying: Carrying out or using a procedure through executing, or implementing.
Analyzing: Breaking material into constituent parts, determining how the parts relate to one another and to an overall structure or purpose through differentiating, organizing, and attributing.
Evaluating: Making judgments based on criteria and standards through checking and critiquing.
Creating: Combining elements to form a coherent or functional whole; reorganizing elements into a new pattern or structure through generating, planning, or producing." [Anderson & Krathwohl, 2001, pp. 67-68]

This Bloom's taxonomy was adapted for machine learning.

Bloom’s Taxonomy Adapted for Machine Learning (ML). Chart: Visual Science Informatics, LLC

There are six levels of model learning in the adapted version of Bloom's Taxonomy for ML. Each level is a conceptually different learning model. The levels order is from lower-order learning to higher-order learning. The six levels are?Store, Sort, Search, Descriptive, Discriminative,?and?Generative. Bloom’s Taxonomy adapted for ML terms are defined as:

Store models capture three perspectives: Physical, Logical, and Conceptual data models. Physical data models describe the physical means by which data are stored. Logical data models describe the semantics represented by a particular data manipulation technology. Conceptual data models describe a domain's semantics in the model's scope. Extract, Transform, and Load (ETL) operations are a three-phase process where data is extracted, transformed, and loaded into store models. Collected data can be from one or more sources. ETL data can be stored in one or more models.
Sort models arrange data in a meaningful order and systematic representation, which enables searching, analyzing, and visualizing.
Search models solve a search problem to retrieve information stored within some data structure, or calculated in the search space of a problem domain, either with discrete or continuous values.
Descriptive models specify statistics that quantitatively describe or summarize features and identify trends and relationships.?
Discriminative models focus on a solution and perform better for classification tasks by dividing the data space into classes by learning the boundaries.
Generative models understand how data is embedded throughout space and generate new data points.

Neural Networks (NNs)

A Neural Network (NN) is a series of algorithms inspired by the structure and function of the human brain. Neural networks are used for a variety of tasks, including image recognition, speech recognition, and natural language processing.

Neural Networks have high predictive power, but have low interpretability because the nature of neural networks is a black box where the inner working of deep networks is not fully explainable.

“An artificial neuron simply hosts the mathematical computations. Like our neurons, it triggers when it encounters sufficient stimuli. The neuron combines input from the data with a set of coefficients, or weights, which either amplify or dampen that input, which thereby assigns significance to inputs for the task the algorithm is trying to learn". [Anddy Cabrera]

Neural Networks learn by adjusting the weights of the connections between neurons. The weights determine how much influence one neuron has on another. By adjusting the weights, a neural network can learn to perform a specific task.

Neural Networks' Architectures: ANN, RNN, LSTM & CNN. Diagrams: A. Catherine Cabrera, and B. InterviewBit

"Neural Network Standard Components:

Nodes: A set of nodes, analogous to neurons, organized in layers.
Weights: A set of weights representing the connections between each neural network layer and the layer beneath it. The layer beneath may be another neural network layer, or some other kind of layer.
Biases: A set of biases, one for each node.
Activation Function: An activation function that transforms the output of each node in a layer. Different layers may have different activation functions." [Google]

Backpropagation is a fundamental algorithm used to train artificial neural networks. It is essentially a computational method for calculating the gradient of the error function with respect to the network's weights. In simpler terms, it helps the network learn from its mistakes by adjusting its parameters to minimize the error between its predicted output and the actual output. The biggest development advance in neural networks between 1987 and 1993 was a wide adaptation of the backpropagation algorithm. This algorithm provided an efficient method for training multi-layer neural networks, allowing them to learn complex patterns and relationships in data. It was a significant breakthrough that revitalized interest in neural networks and paved the way for their subsequent applications in various fields.

Different neural networks have distinct architectures tailored to their functions and strengths. Here are description of major neural networks' architectures:

Artificial Neural Network (ANN): ANN is the foundation for other NN's architectures. ANNs are loosely inspired by the structure and function of the human brain. They consist of interconnected nodes called neurons, arranged in layers. Data is fed into the input layer, processed through hidden layers, and an output is generated. ANNs are powerful for various tasks such as function approximation, classification, and regression.
Recurrent Neural Network (RNN): RNNs were introduced to address the limitations of traditional feedforward networks in handling sequential data, such as time series or natural language. By allowing information to persist across time steps, RNNs could capture dependencies and context within sequences. RNNs are a special kind of ANN designed to handle sequential data such as text or speech. Unlike ANNs where data flows forward, RNNs have connections that loop back, allowing information to persist across steps. This is helpful for tasks such as language translation, speech recognition, and time series forecasting. However, RNNs can struggle with long-term dependencies in data.
Long Short-Term Memory (LSTM): LSTMs are a type of RNN specifically designed to address the long-term dependency problems of RNNs. LSTMs have internal mechanisms that can learn to remember information for longer periods, making them very effective for tasks such as time series forecasting, machine translation, caption generation, and handwriting recognition.
Convolutional Neural Network (CNN): CNNs are another specialized type of ANN excelling at image and video analysis. CNNs were specifically designed for image processing and pattern recognition tasks. They employ convolutional layers that extract and process local features, making them highly effective for tasks such as object detection and image classification. CNNs use a specific architecture with convolutional layers that can automatically extract features from the data. This makes them very powerful for tasks such as image recognition, object detection, and image segmentation.

Here is a table summarizing the key differences:

Key Differences of Neural Networks' Architectures. Table: Gemini

Deep Neural Networks (DNNs) are trained using large sets of labeled or unlabeled data and increasingly learn abstract features directly from the data without manual feature extraction. Traditional neural networks may contain around 2-3 hidden layers, while deep networks can have as many as 100-200 hidden layers.

The Neural Network Zoo. Node Maps: Van Veen, F. & Leijnen, S. (2019). The Asimov Institute

Note that "Node Maps" have limitations in portraying the nuances of deep learning models. There are numerous differences in the usage scenarios, scalabilities, restrictions, and mitigations (decaying, vanishing, and exploding information).?Additional functionality could be?to preprocess, encode, or decode?information, parallel competitive learning, predicting, and generating, or un-black-box.?Additionally, the differences are in inputs: data, feedback, and noise, connectivity: past, present, future, random, reversed, stacked, and extra, and states: activations, triggers, stateless, memory, probabilistic, and pooling multiple weights as a vector.

Also, there are numerous more special networks, layers, and operations such as transformers, latent diffusion models, inception, features pyramid networks, etc.

Deep Belief Networks (DBNs) are a type of deep learning architecture used for unsupervised learning tasks. They can be thought of as building blocks for more complex neural networks. Here is a breakdown of how they work:

- Building Blocks: Restricted Boltzmann Machines (RBMs)

DBNs are composed of multiple layers of processing units called Restricted Boltzmann Machines (RBMs).
RBMs are relatively simple neural networks with two layers: a visible layer that receives input data, and a hidden layer that extracts features from the data.
The key aspect of RBMs is that connections only exist between units in different layers, not between units within the same layer.

- Stacked for Learning: The Deep Belief Network

A DBN is essentially a stack of multiple RBMs.
The hidden layer of one RBM becomes the visible layer for the next RBM in the stack.
This stacking allows DBNs to learn complex features from data in a hierarchical way.
Each layer learns to represent the data in a more abstract form, building on the knowledge from the previous layer.

- Unsupervised Training

Unlike some deep learning models, DBNs are trained in an unsupervised manner. This means they do not require labeled data for training.
The training process involves adjusting the connections between units in each RBM to minimize a specific energy function.
By minimizing this energy, the DBN learns to reconstruct the input data as accurately as possible, essentially capturing the underlying patterns within the data.

- Applications of Deep Belief Networks

DBNs are often used as a pre-training step for more complex supervised deep learning models.
By learning good feature representations from the data in an unsupervised way, DBNs can improve the performance of supervised models when they are fine-tuned for specific tasks such as image recognition or natural language processing.
DBNs can also be used for dimensionality reduction, which is helpful for compressing data without losing important information.

- Some limitations of DBNs include:

They can be computationally expensive to train, especially with large datasets.
Fine-tuning a pre-trained DBN for a specific supervised task can require additional training data.

Overall, Deep Belief Networks are a powerful tool for unsupervised feature learning and can be a valuable component in building more complex deep learning architectures.

A Generative Adversarial Network (GAN) is a type of deep learning system that uses two neural networks to compete against each other. Here is a breakdown of how it works:

Two Neural Networks: There are two main parts to a GAN:

- Generator: This network creates new data, such as images or music, based on the data it is been trained on.

- Discriminator: This network tries to tell the difference between the new data created by the generator and real data from the training set.

The Adversarial Process: These two networks are pitted against each other. The generator is getting better at creating new data, while the discriminator is getting better at spotting fakes. This creates an ongoing competition that refines both networks.
The Result: Over time, the generator learns to create new data that is increasingly difficult for the discriminator to distinguish from real data. Ideally, the generator becomes so good that it can create very realistic and convincing new data.

Conditional Generative Adversarial Network Model Architecture. Chart: Jason Brownlee

Another decision point in choosing a machine learning model is the difference between discriminative, predictive, and generative models. A discriminative approach focuses on a solution and performs better for classification tasks by dividing the data space into classes by learning the boundaries. A predictive approach relies on historical data, statistical modeling, and machine learning algorithms to forecast future trends, outcomes, or behaviors for making informed guesses about what might happen next. A generative model approach understands how data is embedded throughout space and generates new data points.

Discriminative vs. Generative. Table: Supervised Learning Cheatsheet

Generative AI focuses on creating new content, such as images, text, or music, by learning patterns from existing data. It often employs complex models such as GANs and transformers to generate highly creative and realistic outputs. On the other hand, Predictive AI is designed to analyze historical data to forecast future trends and outcomes. It utilizes techniques such as regression, decision trees, and neural networks to identify patterns and make predictions. While Generative AI excels at creativity and innovation, Predictive AI is invaluable for decision-making and risk assessment.

Generative vs. Predicative. Table: Gemini

The table above compares Generative AI vs. Predictive AI employing the "Architectural Blueprints—The “4+1” View Model of Machine Learning" and the "Data Science Approaches to Data Quality: From Raw Data to Datasets" as an architectural evaluation framework.

Activation Function in NNs

An Artificial Neuron in Action. Animation: Anddy Cabrera

Non-Linear Activation Functions. Graphs: Nikita Prasad

Activation functions play a crucial role in neural networks by introducing non-linearity, enabling them to learn complex patterns. The evolution of activation functions has been closely tied to the development of neural network architectures and training algorithms. Early activation functions were Sigmoid and Tanh, but since then there are improved activation functions, which are listed in the table below.

Activation Function Comparison. Table: Gemini

Note: This table provides a brief overview of common activation functions. The choice of activation function often depends on the specific task and architecture of the neural network.

Factors driving evolution

Vanishing gradient problem: The difficulty of training deep networks due to the multiplication of small gradients.
Computational efficiency: The need for efficient activation functions to handle large-scale datasets.
Improved performance: The need for better performance in various tasks, such as image classification, natural language processing, and speech recognition.
Biological inspiration: The desire to mimic the behavior of biological neurons.
Empirical evaluation: The exploration of new activation functions through experimentation and benchmarking.

For example, ReLU is a common choice for many deep learning architectures, and helps to alleviate the vanishing gradient problem in deep networks, leading to improved performance. While more specialized functions such as GELU and Swish may be better suited for certain tasks such as translation, text classification, and natural language processing.

However, it is important to note that other activation functions and their variants, can also perform well in many cases. The best activation function for a given task may need to be determined through experimentation and evaluation.

If you are considering using a specific activation function in your own projects, it is recommended to try it alongside other activation functions and evaluate its performance on your specific dataset.

Best practices for neural network training

"Prevent Vanishing Gradients: The ReLU activation function can help prevent vanishing gradients.
Prevent Exploding Gradients: Batch normalization can help prevent exploding gradients, as can lowering the learning rate.
Prevent Dead ReLU Units: Lowering the learning rate can help keep ReLU units from dying.
Prevent Overfitting: Dropout regularization removes a random selection of a fixed number of the units in a network layer for a single gradient step. The more units dropped out, the stronger the regularization. This is analogous to training the network to emulate an exponentially large ensemble of smaller networks.
"Feed-forward": Only the output node changes. Because inference for this neural network is "feed-forward" (calculations progress from start to finish), the addition of a new layer to the network will only affect nodes after the new layer, not those that precede it." [Google]

softmax Function Layer

A softmax function layer can be used in the output layer of neural networks for multi-class classification problems. It takes a vector of real numbers as input and transforms it into a probability distribution over the possible classes.

Purpose: Transforms raw output scores (real numbers) into a probability distribution over multiple classes.
Ideal for: Mutually exclusive classification problems (where an instance belongs to only one class).
Key Property: Ensures that the sum of output probabilities equals 1.
Interpretability: Provides probabilities for each class, making the model's decision-making process transparent.
Calibration: Ensures that the output probabilities reflect the true confidence of the model.
Multi-class Classification: Handles multiple classes effectively.
Process:

1. Input: The neural network produces a vector of raw scores, one for each class. The softmax layer receives a vector of raw scores, often the output of a preceding layer in the neural network.

2. Exponentiation: Each raw score is exponentiated to ensure positive values.

3. Normalization: The exponentiated values are divided by their sum, scaling them to a probability distribution.

4. Probability Distribution Output: The resulting values represent the probability of the input belonging to each class.

5. Prediction: The class with the highest probability is selected as the predicted class.

Mathematical Formulation:

softmax equation. Google

In summary, softmax is a function that transforms raw output scores into probabilities for multi-class classification. It ensures that the probabilities sum to 1, making the model's predictions interpretable and comparable.

Another aspect of the ML exploration workflow is MLOps (Machine Learning Operations). MLOps is a set of practices that aims to deploy and maintain machine learning models in production reliably and efficiently. It involves a combination of software engineering and continuous machine learning best practices to streamline the entire ML lifecycle, from data ingestion and model training to deployment and monitoring. [Operations: MLOps, Continuous ML, & AutoML]

ML Algorithms Cheat Sheet. Diagram: SSAS

In conclusion, choosing a Machine Learning (ML) depends on multiple complex factors and challenging trade-offs. You will need to consider at least four competing architectural factors: Accuracy, Complexity, Interpretability, and Operations.?Selecting machine learning, which balances all decision factors, is important. Because the capital investment, in the processing pipeline stages, is costly and requires considerable time and effort. Therefore, it is highly valuable to employ a rigorous process in choosing machine learning. [29]

Next, read the "Accuracy: The Bias-Variance Trade-off" article at https://www.dhirubhai.net/pulse/accuracy-bias-variance-tradeoff-yair-rajwan-ms-dsc.

---------------------------------------------------------

[1] https://www.dhirubhai.net/pulse/how-choose-right-chart-your-data-yair-rajwan-ms-dsc

[2] https://medium.com/usf-msds/choosing-the-right-metric-for-machine-learning-models-part-1-a99d7d7414e4

[3] https://www.kdnuggets.com/2018/06/right-metric-evaluating-machine-learning-models-2.html

[4] https://towardsdatascience.com/how-to-measure-interpretability-d93237b23cd3

[5] https://towardsdatascience.com/the-balance-accuracy-vs-interpretability-1b3861408062

Scenarios: Which Machine Learning (ML) to choose?

Yair R.

Predicting a Continuous Value

Classifying Data into Categories

Clustering Data into Groups

ML Exploration Workflow

Ensemble Methods

Bagging (Bootstrap Aggregating)

Boosting

Stacking (Stacked Generalization)

Cascading

Reinforcement Learning (RL)

Model-Based Reinforcement Learning

领英推荐

Policy-Based Reinforcement Learning

Value-Based Reinforcement Learning

Actor-Critic Reinforcement Learning

Time Series Forecasting

Neural Networks (NNs)

Activation Function in NNs

更多精彩文章

社区洞察

其他会员也浏览了

Breaking Down the Buzzwords: Understanding the Basics of Machine Learning

10 Machine Learning Methods that Every Data Scientist Should Know

Machine Learning – A Gentle Approach part 1

Klassifier No Code Machine Learning

Top 10 Guiding Principles for Big Data Analytics Strategy

Unleashing the Power of Machine Learning Algorithms: A Comprehensive Guide

An In-Depth Introduction to Machine Learning: Types, Algorithms, and Real-World Use Cases

Locating Weaknesses in Machine Learning Models

How does machine learning Work? Its importance in 2024

Predicting a Continuous Value

Classifying Data into Categories

Clustering Data into Groups

ML Exploration Workflow

Ensemble Methods

Bagging (Bootstrap Aggregating)

Boosting

Stacking (Stacked Generalization)

Cascading

Reinforcement Learning (RL)

Model-Based Reinforcement Learning

领英推荐

Policy-Based Reinforcement Learning

Value-Based Reinforcement Learning

Actor-Critic Reinforcement Learning

Time Series Forecasting

Neural Networks (NNs)

Activation Function in NNs

Analyze your health & fitness smartwatch numbers with Google Bard/AIGen/PaLM LLM

2023年8月11日

Bloom’s Taxonomy Adapted for Machine Learning (ML)

2023年4月20日

Diagnostic Imaging

2023年2月3日

Remote Care Spectrum: Tracking / Monitoring / Managing

2022年12月7日

RPM Evolution Roadmap

2022年11月29日

Data Science Approaches to Data Quality: From Raw Data to Datasets

2022年11月1日

Architectural Blueprints—The “4+1” View Model of Machine Learning

2022年7月5日

Operations: MLOps, Continuous ML (CML), & AutoML

2022年6月28日

Interpretability: “Seeing Machines Learn”

2022年4月19日

Complexity: Time, Space, & Sample

2022年2月23日

社区洞察

其他会员也浏览了

Breaking Down the Buzzwords: Understanding the Basics of Machine Learning

10 Machine Learning Methods that Every Data Scientist Should Know

Machine Learning – A Gentle Approach part 1

Klassifier No Code Machine Learning

Top 10 Guiding Principles for Big Data Analytics Strategy

Unleashing the Power of Machine Learning Algorithms: A Comprehensive Guide

An In-Depth Introduction to Machine Learning: Types, Algorithms, and Real-World Use Cases

Locating Weaknesses in Machine Learning Models

How does machine learning Work? Its importance in 2024