Machine Learning Day3 - Supervised Learning
Supervised learning

Machine Learning Day3 - Supervised Learning

Machine Learning (ML) algorithms -

  • Machine Learning (ML) algorithms are computational models or procedures that enable computers to learn patterns and make predictions or decisions without being explicitly programmed. These algorithms use statistical techniques to enable a system to improve its performance on a specific task over time as it's exposed to more data.
  • There are various types of ML algorithms, broadly categorized into two main types:?Supervised learning?and?Unsupervised learning.


A . Supervised learning -?

  • Supervised learning is a type of machine learning where an algorithm is trained on a labeled dataset, which means that each input in the training data is associated with a corresponding output.?
  • The goal of supervised learning is for the algorithm to learn a mapping from input to output, allowing it to make predictions or classifications on new, unseen data based on the patterns it has learned during training.?
  • In other words, the algorithm learns from examples, with the supervision of labeled data to guide its learning process.
  • Supervised learning is a type of machine learning where the algorithm is trained on a labeled dataset, which means that each input data point is associated with a corresponding output or target.?
  • The algorithm learns to map input features to the correct output by generalizing patterns from the labeled examples provided during training.
  • Example:
  • Consider a dataset of emails where each email is labeled as either "spam" or "not spam." In supervised learning, the algorithm would be trained on this labeled dataset to recognize patterns in the features of emails (such as words, sender information, some keywords, etc.) and learn to predict whether a new, unseen email is spam or not based on those learned patterns.
  • Supervised learning is divided into two main types:?Classification?and?Regression.


I . Classification -?

  • Classification in supervised learning is a type of machine learning where the algorithm is trained on a labeled dataset to learn the mapping between input features and predefined categories or classes.
  • The goal is to predict the categorical class or label of new, unseen data based on the patterns learned during training.
  • Simple Example: Consider a dataset of emails labeled as either "spam" or "not spam." In a classification task, the algorithm learns to analyze features of emails (like words, sender information, etc.) and classify new, unseen emails as either spam or not spam based on the learned patterns.
  • Classification algorithms in machine learning are categorized based on their underlying principles, techniques, and the nature of the decision boundaries they create. Here are some common types of classification algorithms:

1. Logistic Regression

2. K-Nearest Neighbors (KNN)

3. Naive Bayes

4. Decision Trees

5. Random Forest


1. Logistic Regression:

  • Logistic Regression works by modeling the?probability of a binary outcome?(e.g., pass/fail) based on one or more input features. It uses a logistic function to constrain predictions between 0 and 1, and a decision threshold(most probably 0.5 ) is set to classify outcomes.?
  • The Logistic Regression algorithm is commonly used for binary classification problems. Let's explain it with an example where we want to predict whether a student will pass or fail an exam based on the number of hours they studied.
  • Example:

Imagine we have a dataset with the following information:

pass/fail table

  • Logistic Regression Process:

1. Data Preparation:

  • Convert the "Pass/Fail" labels into binary values (e.g., 1 for Pass, 0 for Fail).

2. Model Training:

  • Logistic Regression models the relationship between the input (study hours) and the output (pass/fail) using a logistic function. The model estimates coefficients that define the decision boundary.
  • Sigmoid Transformation:Equation of logistic regression:

Sigmoid function

  • where,- Y represents the dependent variable- X represents the independent variable- a1 is the coefficient of the line (how much Y changes for a unit change in X).- a0 is the intercept (the value of?Y?when?X?is?0).
  • Decision Threshold:Choose a decision threshold (often 0.5). If the predicted probability is above this threshold, predict class 1 (pass); otherwise, predict class 0 (fail).

graph


  • Interpretation:- A higher value of?(a0+a1X) makes the sigmoid function approach 1, indicating a higher probability of passing the exam.- A lower value of?(a0+a1X) makes the sigmoid function approach 0, indicating a lower probability of passing.

3. Prediction:

- Given a new value (e.g., a student who studied for 4.5 hours), the model calculates the probability of passing the exam using the logistic function.

  • Decision Threshold:- Choose a decision threshold (e.g., 0.5). If the calculated probability is above the threshold, predict "Pass"; otherwise, predict "Fail."

4. Interpretation:

If a student studied 4.5 hours, the model calculates a probability, say 0.7, which is above the 0.5 threshold. Thus, the model predicts the student will pass.

If another student studied for 3 hours, the model might calculate a probability of 0.3, below the threshold, predicting a fail.


In summary , Logistic Regression models the probability of an event occurring. It's suitable for binary classification tasks, and the logistic function ensures predictions are between 0 and 1, representing probabilities. The decision threshold determines the class assignment based on these probabilities.

In the example, we used study hours to predict exam results, but in practice, logistic regression can handle multiple features to make more complex predictions.


2. K-Nearest Neighbors (KNN):

K-Nearest Neighbors (KNN) works by finding the k data points in the training set that are closest to a new input and making predictions based on the majority class (for classification) or the average value (for regression) of these neighbors.

The closeness is typically measured using distance metrics like Euclidean distance. It's a simple yet effective algorithm for pattern recognition and prediction.

Certainly! Let's explain the k-Nearest Neighbors (kNN) algorithm using a movie genre classification example. In this scenario, we'll consider movies with two features: "Popularity" and "Action Level." We want to predict the genre (either "Action" or "Comedy") of a new movie based on these features.

Movie Genre Classification Example:

Imagine we have the following dataset:

movie genre


Now, let's say we want to predict the genre of a new movie with Popularity 7 and Action Level 6.

KNN process :

  • Calculate Distances:

- Calculate the Euclidean distance between the new movie and each existing movie in the dataset.

- For example, the distance between the new movie and Movie1: sqrt((7-8)^2 + (6-7)^2) = sqrt(1 + 1) = sqrt(2).

  • Select k-Nearest Neighbors:

- Let's choose k=3 for this example. Find the three movies with the shortest distances. Suppose the nearest neighbors are Movie1, Movie2, and Movie5.

  • Majority Voting for Genre:- Since it's a classification task, count the occurrences of each genre among the k-nearest neighbors. The genre with the highest count is the predicted genre.- In this case, among Movie1, Movie2, and Movie5, we have two comedies and one action. Therefore, we predict the new movie as a Comedy.


This process demonstrates how the kNN algorithm can be applied to classify a movie into a particular genre based on its similarity to other movies in a feature space. Adjusting the value of k can impact the prediction, and it's essential to choose an appropriate k based on the characteristics of the dataset.


3. Naive Bayes :

  • The Naive Bayes algorithm is a probabilistic machine learning algorithm used for classification tasks. It is based on Bayes' theorem, and the "naive" part comes from the assumption of feature independence given the class label.
  • Naive Bayes is called "naive" because of its assumption of feature independence given the class label. This assumption is considered simplistic or "naive" because, in reality, features often have some degree of correlation or dependence.
  • Despite this simplification, the Naive Bayes algorithm is surprisingly effective in many real-world applications, especially in text classification and spam filtering.
  • Naive Bayes often performs well in various applications. Here's how it works:Naive Bayes' process :

  1. Bayes' Theorem:

The algorithm starts with Bayes' theorem, which relates the conditional and marginal probabilities of random events:

conditional probability

In the context of Naive Bayes and classification:

example


2. Independence Assumption:

The "naive" assumption is that the features are conditionally independent given the class. This means that the presence or absence of one feature does not affect the presence or absence of another feature, given the class label.

3. Training:

  • Prior Probabilities (Class Priors):Calculate the prior probabilities of each class in the training dataset. This is done by counting the occurrences of each class and dividing by the total number of instances.
  • Likelihoods (Feature Probabilities): For each feature and each class, calculate the likelihood of observing that feature given the class. This is done by counting the occurrences of the feature within instances of the class and dividing by the total number of instances of that class.
  • Classify Using Posterior Probabilities: Given a new instance with features, use Bayes' theorem to calculate the posterior probability of each class. The class with the highest posterior probability is assigned as the predicted class.

4. Prediction:

  • Posterior Probability Calculation: For a new instance with features?X1,X2,...,Xn calculate the posterior probability for each class?Ci:P(CiX1,X2,...,Xn) ∝ P(CiP(X1∣CiP(X2∣Ci)×...×P(XnCi)
  • Decision Rule: Choose the class with the highest posterior probability as the predicted class.

Example:

Dataset:

Consider the following dataset with binary features indicating the presence (1) or absence (0) of symptoms:

disease prediction

Naive Bayes Algorithm:

Step 1: Prior Probabilities

Calculate the prior probabilities of each class (COVID and Flu):


Step 2: Likelihoods

Calculate the likelihoods of each symptom given each class:

Step 3: Posterior Probabilities

Use Bayes' theorem to calculate the posterior probabilities for each class given the observed symptoms:

P(COVID / Fever=1,Cough=1)∝P(COVIDP(Fever=1/COVIDP(Cough=1/COVID)

P(Flu / Fever=1,Cough=1)∝P(FluP(Fever=1/FluP(Cough=1/Flu)

Step 4: Prediction

Compare the posterior probabilities and predict the class with the highest probability.

This involves calculating the normalized probabilities and choosing the class with the maximum value.

For instance, if

P(COVID/Fever=1,Cough=1)>P(Flu/Fever=1,Cough=1), the prediction is COVID.

This is a simplified illustration of how Naive Bayes works using the provided dataset. The actual calculations involve plugging in the numbers and normalizing the probabilities, but the underlying steps remain the same.


4. Decision Tree :

  • Objective:?Decision trees aim to predict outcomes by partitioning datasets based on input feature values, creating a structured tree with internal nodes and leaf nodes.
  • Working Principle:?The algorithm operates in a top-down, recursive manner. It selects features that offer optimal splits, maximizing information gain or minimizing impurity. Subsets are created based on chosen features, and the process iterates for each subset.
  • Decision Criteria:?Measures like information gain (for classification) or variance reduction (for regression) guide the selection of features for splitting.
  • Stopping Criteria:?The tree-building process stops upon meeting criteria like reaching a maximum depth, having a minimum number of instances in a node, or achieving pure leaves with homogeneous class labels.
  • Prediction:?Predictions for new instances involve traversing the tree from the root to a leaf node based on the features of the instance.

ID3 Algorithm Overview:

  • Iterative Dichotomiser 3:?ID3, developed by Ross Quinlan, is an early decision tree algorithm tailored for classification tasks.
  • Steps of ID3:

  1. Selecting the Best Attribute:?Information gain is calculated for each attribute based on target variable entropy or impurity, and the attribute with the highest information gain becomes the decision feature for the current node.
  2. Creating Subtrees:?For each unique attribute value, a subtree is created, and the process recurs until stopping criteria are met.
  3. Stopping Criteria:?ID3 stops when specified criteria are satisfied, such as reaching a set tree depth or achieving pure nodes.

  • Entropy and Information Gain:?ID3 employs entropy to measure dataset impurity, and information gain signifies the reduction in entropy achieved by selecting an attribute for splitting.
  • Categorical Features:?ID3 is well-suited for datasets with categorical features.
  • Limitations:?ID3 may lead to overfitting, especially with noisy data or datasets containing numerous features.

In summary, decision trees, including the ID3 algorithm, offer a transparent and interpretable approach to machine learning decision-making. Their simplicity and effectiveness make them valuable in various applications.


Example :

let's use a simplified example of a decision tree algorithm based on weather conditions (whether, temperature, humidity) to predict whether to play a game. In this example, we'll consider the binary outcome: play or not play.

Suppose you have the following dataset:

Let's walk through the decision tree algorithm based on the whether, temperature, humidity dataset using the ID3 algorithm.

Step 1: Calculate Entropy for the Target Variable (Play)

Calculate the entropy for the target variable (Play):


Step 2: Calculate Information Gain for Each Attribute

Information Gain for Weather:

Information Gain for Temperature:

Information Gain for Humidity:

Step 3: Choose the Attribute with the Highest Information Gain

Select the attribute with the highest information gain. Let's assume Weather has the highest information gain in this example.

Step 4: Create Subtrees and Repeat

Create branches for each unique value of the selected attribute (Weather: Sunny, Overcast, Rainy). Repeat the process recursively for each subset until stopping criteria are met.

Resulting Decision Tree:

Decision Tree

Let's continue solving the decision tree further based on the Weather, Temperature, and Humidity dataset.

Subtree 1: Weather = Sunny

For the subset where Weather is Sunny:

Since all instances in this subset have the same outcome (No), the entropy is 0.

Subtree 2: Weather = Overcast

For the subset where Weather is Overcast:

Again, entropy is 0 since all instances have the same outcome (Yes).

Subtree 3: Weather = Rainy

For the subset where Weather is Rainy:

Entropy is 0 due to a unanimous outcome (Yes).

Resulting Decision Tree (Updated):


The decision tree has now been further pruned based on the Weather attribute, and we've reached leaf nodes where decisions are made. If the Weather is Sunny, the prediction is "No." If the Weather is Overcast or Rainy, the prediction is "Yes."

This is a basic example, and in real-world scenarios, decision trees can become more complex, especially when dealing with more features and larger datasets. The ID3 algorithm continues this process recursively, selecting the best attributes for each node until a stopping criterion is met, resulting in a tree structure that can be used for predictions on new data.


5. Random Forest :

Random Forest is an ensemble learning algorithm that combines the predictions of multiple individual models (decision trees) to improve overall performance and robustness. Developed based on the idea of bagging (Bootstrap Aggregating), Random Forest has become a popular and powerful tool in machine learning.

Ensemble learning-

Ensemble learning is a machine learning approach that involves combining the predictions of multiple individual models (learners) to improve overall performance and robustness. Instead of relying on a single model, ensemble methods leverage the diversity of multiple models to make more accurate and reliable predictions. The idea is that by combining the strengths of different models, the weaknesses of individual models can be mitigated.

Ensemble learning is effective in enhancing predictive performance, reducing overfitting, and increasing the model's generalization ability

Here's an overview of the Random Forest algorithm:

Key Concepts:

  1. Decision Trees:Random Forest is built on the foundation of decision trees.Decision trees are individual models that make predictions by recursively partitioning the input space based on feature values.
  2. Ensemble Learning:Random Forest is an ensemble method that combines the predictions of multiple decision trees to achieve a more accurate and robust result.
  3. Bagging (Bootstrap Aggregating):Random Forest uses bagging to create diverse sets of training data for each tree.It randomly samples the training dataset with replacement (bootstrap samples), creating multiple subsets.
  4. Random Feature Selection:For each split in a decision tree, Random Forest considers only a random subset of features instead of using all features.This randomness introduces diversity among the trees.
  5. Voting or Averaging:In classification tasks, Random Forest combines predictions through a majority voting mechanism.In regression tasks, predictions are averaged across the ensemble.

Algorithm Steps:

  1. Bootstrap Sampling:Create multiple bootstrap samples by randomly selecting instances with replacement from the original training dataset.
  2. Build Decision Trees:For each bootstrap sample, build a decision tree using a subset of features randomly chosen at each split.The trees are grown deep, typically without pruning.
  3. Aggregate Predictions:Combine the predictions of individual trees through majority voting (classification) or averaging (regression).

Advantages of Random Forest:

  1. Reduced Overfitting:By averaging or voting over multiple trees, Random Forest tends to be more robust and less prone to overfitting compared to individual decision trees.
  2. Improved Accuracy:The ensemble nature of Random Forest often leads to higher accuracy, especially when dealing with complex datasets.
  3. Feature Importance:Random Forest provides a measure of feature importance based on how much each feature contributes to the overall performance of the ensemble.
  4. Versatility:Random Forest can be applied to both classification and regression problems.

Limitations:

  1. Interpretability:The ensemble nature of Random Forest may make it less interpretable compared to individual decision trees.
  2. Computational Cost:Training multiple decision trees can be computationally expensive, especially for large datasets.
  3. Memory Usage:Random Forest requires storing multiple trees, which may increase memory requirements.

In summary, Random Forest is a versatile and powerful algorithm that leverages the strength of multiple decision trees for improved accuracy and robustness. It is widely used in various machine learning applications, including classification and regression tasks.

Example :

Let's use a simple example to explain Random Forest:

Scenario: Imagine you are trying to predict whether a person will like a particular type of outdoor activity based on two features: the weather (Sunny or Rainy) and the temperature (Warm or Cold).

Dataset:

Individual Decision Trees: Suppose we decide to build two decision trees from bootstrapped samples (random subsets with replacement) of our dataset:

  1. Decision Tree 1:Considers Weather and Temperature for each split.Tree structure may look like:

  1. Decision Tree 2:

  • Considers Weather and Temperature for each split.
  • Tree structure may look like:

Random Forest: Now, let's create a Random Forest by combining the predictions of these two decision trees.

  • Voting:For each new instance, both trees make a prediction (Yes or No).The final prediction is determined by majority voting: whichever prediction gets more votes.

Prediction Example: Suppose we have a new instance with Weather=Sunny and Temperature=Warm. Let's see how each tree predicts:

  1. Decision Tree 1:Follows the path: Sunny → WarmPredicts: Yes
  2. Decision Tree 2:Follows the path: Warm → Sunny Predicts: Yes

Random Forest Prediction:

  • Both trees predict "Yes."
  • The final prediction is "Yes" since it received more votes.

In this way, Random Forest combines the diverse predictions of individual decision trees to make a more robust and accurate prediction for a given input. This example demonstrates the basic idea of how Random Forest works by using multiple trees to collectively enhance predictive performance.


What is overfitting and underfitting ?

Overfitting: Overfitting occurs when a machine learning model learns the training data too well, capturing noise and random fluctuations in the data rather than the underlying patterns. In other words, the model becomes too complex, fitting the training data too closely, and as a result, it may not generalize well to new, unseen data. Overfitting often leads to poor performance on new examples because the model has essentially memorized the training data, including its noise and outliers.

Signs of Overfitting:

  1. The model performs exceptionally well on the training data but poorly on new, unseen data.
  2. The model captures details and fluctuations in the training data that may not represent true patterns.
  3. The model has too many parameters or is too complex relative to the size of the training dataset.
  4. The model may exhibit high variance, showing sensitivity to small changes in the training data.

Ways to Address Overfitting:

  1. Simplify the Model: Use a simpler model with fewer parameters or features to reduce complexity.
  2. Feature Selection: Choose relevant features and discard irrelevant or redundant ones.
  3. Regularization: Add regularization terms to the model's objective function to penalize overly complex models.
  4. Cross-Validation: Use techniques like cross-validation to assess model performance on different subsets of the data and identify overfitting.
  5. Increase Data Size: Provide more training data to allow the model to generalize better.

Underfitting: Underfitting occurs when a machine learning model is too simple to capture the underlying patterns in the training data. The model is unable to learn the complexities of the data, resulting in poor performance on both the training and new data. Underfitting is often associated with models that are too basic or lack the capacity to represent the true relationships within the data.

Signs of Underfitting:

  1. The model performs poorly on both the training and new data.
  2. The model struggles to capture the underlying patterns or trends in the data.
  3. The model may have too few parameters or features to adequately represent the complexity of the relationships in the data.

Ways to Address Underfitting:

  1. Increase Model Complexity: Use a more complex model with additional parameters or features to better capture patterns in the data.
  2. Feature Engineering: Introduce new features or transform existing ones to expose more information to the model.
  3. Adjust Hyperparameters: Tweak hyperparameters, such as learning rate or regularization strength, to find a better balance between model simplicity and complexity.
  4. Add Interactions: For linear models, consider adding interaction terms between features to capture more complex relationships.
  5. Ensemble Learning: Combine multiple simple models to create a more robust and accurate ensemble model.

Both overfitting and underfitting represent challenges in achieving a well-balanced machine learning model that generalizes effectively to new data. Striking the right balance often involves careful tuning of model complexity, regularization, and the amount of training data.



"Embark on a transformative journey with the Supervisory Management Transformational Program (SMTP). Unveiling a meticulously crafted High-Level Structure and a 14-step Transformational Ladder, this program is designed to elevate supervisory skills to new heights. From foundational principles to advanced leadership strategies, each step propels participants toward managerial excellence, fostering a culture of innovation, collaboration, and sustainable success. Join us in redefining leadership through SMTP, where every rung on the ladder signifies a strategic leap toward organizational brilliance." ? #leadershiptransformation #SupervisorSuccess #SmartSupervisors #InspiringSupervisors #leadershipdevelopment #leadershipskills #effectivemanagement #SupervisoryExcellence #HighLevelSupervision #ManagementRevolution #supervisors #supervision #supervisedlearning ? https://www.dhirubhai.net/posts/yasernazir_leadershiptransformation-supervisorsuccess-activity-7165692222141591552-_IzN?utm_source=share&utm_medium=member_desktop

  • 该图片无替代文字
回复

Your dedication to crafting informative content on machine learning is commendable, and it's clear you understand the value of thorough explanations. ?? Generative AI could significantly enhance your work by streamlining data analysis and content creation, ensuring you deliver high-quality articles even faster. By integrating generative AI into your workflow, you can focus on complex concepts while AI assists with data preparation and predictive modeling, adding depth and precision to your articles. ?? I'd love to show you how generative AI can elevate your content and efficiency. Let's chat about the possibilities - join our WhatsApp group to book a call! ?? https://chat.whatsapp.com/L1Zdtn1kTzbLWJvCnWqGXn Brian

回复
ARATI MANE

Aspiring Data Analyst || Data Scientist || Machine Learning || Business Analyst

10 个月

Keep growing ??

回复
Aakash Patil

Associate Engineer @Worldline Global Services

10 个月

Keep it up

pallavi khambale

Actively looking for data analytics opportunities| Data analyst | Microsoft Excel | Advance Excel | My sql | python | Power BI | Tableu | R programming | IT student

10 个月

Looking forward to the ongoing journey, and eagerly anticipating the next article!

要查看或添加评论,请登录

Deepa M Dixit的更多文章

  • Machine Learning Day 5 -Trick questions of supervised learning

    Machine Learning Day 5 -Trick questions of supervised learning

    1. Why was Machine Learning Introduced? Machine learning was introduced to enable computers to learn from data and make…

    2 条评论
  • Machine Learning Day 4 -Regression Algorithms

    Machine Learning Day 4 -Regression Algorithms

    Regression Regression is a type of supervised machine learning algorithm used for predicting a continuous outcome or…

    8 条评论
  • Machine Learning Day 2 - Statistics

    Machine Learning Day 2 - Statistics

    Statistics: Statistics is a branch of mathematics that involves collecting, analyzing, interpreting, presenting, and…

    3 条评论
  • Machine Learning DAY 1 -

    Machine Learning DAY 1 -

    Machine learning is a type of artificial intelligence (AI) that enables computer systems to enhance their performance…

    7 条评论

社区洞察

其他会员也浏览了