登录查看更多内容

Machine Learning Classification Algorithms - 1/2 An Introduction

Elsayed Rashed

Technology Leader || Helping businesses to achieve their digital transformation by leveraging the power of Data, AI/ML, and Cloud Computing through agile engineering and problem-solving creativity

发布日期: 2023年11月5日

Supervised Machine Learning algorithms are classified into Regression and Classification. Regression predicts continuous values, while Classification is used for predicting categorical values.

?Classification is a widely used technique in machine and statistical learning. It is mainly used for identifying spam emails, analyzing financial risks, predicting customer churn, and discovering potential customers.

?In two articles, I will introduce classification algorithms and provide an example of their application to solve a classification problem.

?Article I : Machine Learning Classification Algorithms - An Introduction

Article? II: Language Detector

?Introduction to Machine Learning

Machine learning is a subset of Artificial Intelligence and a subfield of Data Science. It involves the study of how software can learn from past experiences. Machine learning enables computers to learn on their own by using statistical methods to improve performance and predict output without the need for explicit programming.

The relationships between AI, machine learning, deep learning, data science, and mathematics

In the last 5-10 years, there has been a rapid explosion in growth in the field of machine learning, owing to incredible breakthroughs in new algorithms such as deep learning. This, combined with an exponential increase in CPU power, especially in parallel operations with GPUs and TPUs, has allowed for huge improvements in the training of machine learning models.

Types of Machine Learning

Supervised Learning:

?Supervised learning is a popular type of machine learning approach, where labeled data is provided to the machine learning system for training. The system predicts the output based on this training. It is a simple and widely-known automatic learning task. It relies on pre-defined examples, where the category of each input is already known.

?For example, In a spam filtering dataset, we can find both spam messages and non-spam messages. This allows us to identify which messages are spam and which are not during the training process. With this knowledge, we can train our model to accurately classify new and unseen messages.

?In the context of supervised learning, there are two main types of tasks: classification and regression:

Classification is used to predict which class a data point is part of (discrete value)
Regression is used to predict continuous values.

?In other words, in classification tasks, the label of the class attribute is predicted, while regression tasks predict a numeric value for the class attribute.

?Common supervised learning applications include:

?Predictive analysis based on regression or categorical classification.
Spam detection
Pattern detection
Natural Language Processing
Sentiment analysis
Automatic image classification
Automatic sequence processing (for example, music or speech)

Unsupervised Learning:

Unsupervised learning refers to a method where a machine learns without any guidance or supervision. In this type of learning, data points do not have any labels or predetermined classes. Therefore, the algorithm needs to infer the classes from the unstructured dataset, which means that its primary goal is to pre-process the data by describing its structure in a structured way.

To enable unsupervised learning, clustering techniques are used to group unlabeled data based on similarity measures, revealing hidden patterns and facilitating feature learning.

?Commons unsupervised applications include:

?Object segmentation (for example, users, products, movies, songs, and so on)
Similarity detection
Automatic labelling?

Reinforcement Learning:

?Reinforcement learning is a technique where the model learns from a series of actions or behaviors, allowing it to improve over time. The complexity of datasets or sample complexity is crucial in the success of reinforcement learning algorithms, as it affects the ability of the algorithm to learn the target function effectively.

?Reinforcement learning is a feedback-based machine learning method where an agent gets rewarded for taking correct actions and penalized for incorrect ones.

?Commons reinforcement applications include:

?Robotics
Self-Driving Cars
Internet of Things (IoT) applications

Introduction to Classification Technique

Classification Technique is a type of Supervised Learning that helps to identify the appropriate category for new observations based on the training data. In this method, a program learns from the available dataset, and then assigns new observations into different categories or classes, such as Yes or No, 0 or 1, Spam or Not Spam, and so on. This approach is useful for making accurate predictions and improving decision-making processes.

The classification algorithm requires labeled input data, with input and corresponding output variables representing categories.

Types of Classification

?The algorithm used to classify a dataset is called a classifier. There are three types of classifications:

?Binomial (Binary) Classifier

?Classifying data into binary categories such as presence/absence, positive/negative, or diseased/healthy.

Multinomial Classifier

Classifies data into three or more classes, such as document classification for Politics, Sports, Social issues, and the Economy.

Ordinal Classifier

Classifies data into three or more ordered classes such as "low", "medium", or "high" based on risk level.

Classification Algorithms

A classification algorithm is a type of Supervised Learning technique that helps in identifying the category of new observations based on training data. To better understand classification algorithms, you can refer to the following diagram. The diagram shows two classes - Class A and Class B, with features that are similar within the same class and dissimilar across different classes.

领英推荐

Machine Learning vs. AI: What’s the Difference?

Get Ahead by LinkedIn News 2 年前

What is Machine Learning ?

5G 6G & O-RAN 2 年前

Understanding Machine Learning: The Future of…

Sadup Softech 5 个月前

There are mainly two categories of Classification Algorithms:

?Linear Models, ike:

?Logistic Regression
Support Vector Machines (SVM)

?Non-linear Models, like:

?K-Nearest Neighbours (K-NN)
Na?ve Bayes
Decision Tree
Random Forest

Logistic Regression

Logistic Regression is a popular Machine Learning algorithm that can provide probabilities and classify new data using both continuous and discrete datasets.

K-Nearest Neighbors (KNN)

KNN is a simple non-parametric machine learning algorithm that does not make any assumptions about the underlying data.

Support Vector Machine (SVM)

As a machine learning algorithm, SVM is considered powerful and well-suited for smaller datasets. However, its effectiveness extends to complex datasets as well. SVM constructs hyperplanes or a set of hyperplanes in a high or infinite-dimensional space that can be used for classification.

Naive Bayes

As a probabilistic classifier, Naive Bayes uses the Maximum A Posteriori decision rule in a Bayesian setting to make classifications. One of the advantages of Naive Bayes is its ability to handle imbalanced data, making it a popular choice for text classification tasks such as spam filtering.

Decision Tree

The Decision Tree algorithm is a popular algorithm due to its simple approach in dealing with complex datasets. It has a hierarchical, tree structure consisting of a root node, branches, internal nodes, and leaf nodes.

Random Forest

Random Forest is a type of classifier that consists of multiple decision trees. It takes the average of these trees to improve the accuracy of predictions. This method is based on the concept of ensemble learning, which involves combining multiple classifiers to solve complex problems.

Evaluating Classification Models

After completing the classification model, it is important to evaluate its performance using the confusion matrix and associated metrics.

?Confusion Matrix

The confusion matrix, also known as the error matrix, is a table that outlines the performance of the model.

?The matrix displays the number of correct and incorrect predictions in a summarized table format:

By utilizing the confusion matrix, we can calculate the model's accuracy and several other performance metrics.

Accuracy

Calculating accuracy is a vital aspect in determining the effectiveness of classification problems. Accuracy refers to the frequency of correct predictions made by the model. It can be computed by dividing the number of correct predictions made by the classifier by the total number of predictions. The formula for accuracy is given below:

?Accuracy = (TP+TN) / (TP+FP+FN+TN)

Precession

The precision of a model is the proportion of correct outputs to the total number of positive and negative classes. Precision answers the question: what proportion of predicted positives is truly positive?

Precision is a valid choice of evaluation metric when we want to be very sure of our prediction.

It can be calculated using the below formula:

?Precession = (TP) / (TP+FP)

Recall

Recall measures the fraction of positive cases that are correctly predicted by the model. Recall answers the question: what proportion of actual positives is correctly classified?

Recall is a valid choice of evaluation metric when we want to capture as many positives as possible.

?It can be calculated using the below formula:

?Recall = (TP) / (TP+FN)

F1 Score

Comparing two models that have either low precision and high recall or high precision and low recall can be challenging. To overcome this, we can use the F-score which evaluates both recall and precision simultaneously. The F-score is at its maximum when the recall and precision are equal.

F1 score is the harmonic mean of precision and recall, and is a value between 0 and 1. It balances the precision and recall of a classifier.

?It can be calculated using the below formula:

?F1 Score = 2 (Precession Recall) / (Precession + Recall)

要查看或添加评论，请登录

Elsayed Rashed的更多文章

Machine Learning Classification Algorithms - 2/2 Language Detector

2023年11月6日

Machine Learning Classification Algorithms - 2/2 Language Detector

Supervised Machine Learning algorithms are classified into Regression and Classification. Regression predicts…
Real-Time Data Analytics Platform - 3/3 Solution Architecture

2023年11月1日

Real-Time Data Analytics Platform - 3/3 Solution Architecture

Nowadays, Real-time Data Analytics is becoming an important pillar for organizations dealing with big data. They…
Real-time Data Analytics Platform - 2/3 Multi-Tier Architecture

2023年10月31日

Real-time Data Analytics Platform - 2/3 Multi-Tier Architecture

Nowadays, Real-time Data Analytics is becoming an important pillar for organizations dealing with big data. They…
Real-time Data Analytics Platform - 1/3 Architecture & Design Considerations

2023年10月31日

Real-time Data Analytics Platform - 1/3 Architecture & Design Considerations

Nowadays, Real-time Data Analytics is becoming an important pillar for organizations dealing with big data. They…

?Introduction to Machine Learning

Types of Machine Learning

Supervised Learning:

Unsupervised Learning:

Reinforcement Learning:

Introduction to Classification Technique

Types of Classification

?Binomial (Binary) Classifier

Multinomial Classifier

Ordinal Classifier

Classification Algorithms

领英推荐

?Linear Models, ike:

?Non-linear Models, like:

Logistic Regression

K-Nearest Neighbors (KNN)

Support Vector Machine (SVM)

Naive Bayes

Decision Tree

Random Forest

Evaluating Classification Models

?Confusion Matrix

Accuracy

Precession

Recall

F1 Score

Elsayed Rashed的更多文章

Machine Learning Classification Algorithms - 2/2 Language Detector

Real-Time Data Analytics Platform - 3/3 Solution Architecture

Real-time Data Analytics Platform - 2/3 Multi-Tier Architecture

Real-time Data Analytics Platform - 1/3 Architecture & Design Considerations

社区洞察

其他会员也浏览了

Demystifying Machine Learning: What is it and why is it important?

From Data to Decisions: How Machine Learning Powers Modern Business

"Machine Learning: Unleashing the Power of Artificial Intelligence"

Mastering Predictive Precision: Exploring the Depths of Supervised Learning

Understanding Various Machine Learning Model Structures

Machine Learning: Powering the Future - How ABC Trainings Can Equip You for Success

Machine Learning: The Future of Technology

What Is Machine Learning? Definition, Types, Applications, and Trends

Deep Learning Roadmap 2022 - The Ultimate Guide

Unleashing the Power of Machine Learning: Empowering Computers to Learn and Adapt