登录查看更多内容

Classification of Iris Dataset using Logistic Regression

Minidu Wickramaarachchi

AI Dev & Enthusiastic Coder

发布日期: 2023年12月2日

Iris Data Introduction

The Iris dataset is a renowned and pivotal dataset that was first introduced by the statistician and biologist Sir R.A Fisher in 1936 which became a widely using dataset in machine learning and statistics. This dataset got its name because of an iris flower and segregate three species of Iris flowers based on their variation. This specific dataset consists with 150 samples of data based on 3 different types of iris flower. The data consists of 50 samples each from the 3 different types mentioned. ?

1.????? Iris Setosa

2.????? Iris Virginica

3.????? Iris Versicolor, are the mentioned three different type of iris flower.

With related to the each and every different variant of iris flower, we categorize them to three classes and there are 4 primary featured measured for each flower.

·???????? Sepal Length: The length of the flower's sepal.

·???????? Sepal Width: The width of the sepal.

·???????? Petal Length: The length of the flower's petal.

·???????? Petal Width: The width of the petal.

As per the dataset's structure, each entry is an observation made using the above four features measured in centimeters and a target or class variable. target variable takes on the values 0, 1, and 2 that represent the three different types of iris flower which made it’s a perfect choice for educational and illustrative purposes. In the context of machine learning and data mining, the Iris dataset is used to illustrate clustering, classification, and pattern recognition. Which helps k-means clustering to segregate the data into clusters and Classification algorithms like logistic regression, decision tree and Support vector machines also used it extensively.

Methods

When come to the classification of iris dataset, there are many algorithms can be seen, KNN or K-Nearest Neighbors, Decision trees and Logistic regression are some of them. In this classification Logistic Regression is used as the method. Logistic Regression is a statistical model primarily used for binary classification problems. It uses the properties of the logistic function to model the likelihood of a category of the variable as a function of the predictor variables.

Logistic Regression is primarily used for binary classification problems, but it can be extended to multiclass classification problems as well. The model will estimate probabilities by using a logistic function, and then used for class predictions. When come to the steps of the logistic regression model coefficient is estimated based on the training data using a method called maximum likelihood estimation which check for the more likely sets. While the prediction process is happening, for each and every instance it calculates the weighted sum of the input features and output the logistic result instead of linear regression which outputs the result directly. The output value also ranges in-between 0 and 1 which helps to determine the class.

When come to the advantages of logistic regression it helps to output the probabilities rather than class prediction. This helps to improve the confidence level made by the dataset, which proves that Logistic regression is a simple and powerful model which can be used to train on datasets with a large number of features.

In the application process of the Logistic regression to the iris dataset, it considers four variables measured from each sample, the lengths and the widths of the sepals and petals, to predict the class of iris species. Then the results output a number from 0 and 2, with each result representing one of the three classes of Iris species. Then it is converted into a class prediction, if greater than 1.5, we can classify the instance as the third class, if not its classified under first or second class.

领英推荐

K-means Clustering: Applications and Real-world Use…

Vrata Tech Solutions (VTS) 12 个月前

Demystifying Data Science, Part IV: Models and Machine…

Ian Thomas 5 年前

Some essential data science concepts from A to Z.

Suraj Kumar Soni 11 个月前

Code

For the coding, python language is used.

For further reference I have uploaded the dataset along with the code: - https://github.com/minidu97/Iris-Dataset-Classification-Using-Logistic-Regression.git

Result Analysis

When come to the classification of the dataset of iris by logistic regression, it producing quite a high accuracy on both the training and testing datasets. The training data accuracy is approximately 79.51%, which means that the model correctly predicted the classes almost 80% of the time after assuming to the closest number. The test data accuracy is also higher, which is a positive result. It's around 89.29%, implying that the model was able to generalize well from the training data to unseen data, correctly predicting nearly 90% of test cases. This happens because the test data set has not been used or visible when the training process is going on.

The Decision Function Scores give insight look into the prediction it makes for each class. Each and every row is computed as the dot product of the input values and the corresponding coefficients so that the data point is classified based on which row or the class has the highest value. Based on all the results, can get a glimpse of an idea on the classification which helps to indicate the confident level of the class.

要查看或添加评论，请登录

Minidu Wickramaarachchi的更多文章

AI to Detects Brain Abnormalities and Cure Epilepsy

2024年9月28日

AI to Detects Brain Abnormalities and Cure Epilepsy

Abstract The application of Artificial Intelligence (AI) in healthcare, particularly in the detection and treatment of…

1 条评论
Enhancing NLP-Video Summarization for Legal Proceedings and Evidence Review with Watson and GPT

2024年8月12日

Enhancing NLP-Video Summarization for Legal Proceedings and Evidence Review with Watson and GPT

Abstract about this Article Natural Language Processing (NLP) and video summarization techniques have become…
Leveraging VGG16 and Nearest Neighbors for Efficient Image Classification and Similarity Retrieval: A case study on Outdoor Place Recognition

2024年1月21日

Leveraging VGG16 and Nearest Neighbors for Efficient Image Classification and Similarity Retrieval: A case study on Outdoor Place Recognition

Introduction The scope of this project lies within the field of Image Recognition, an essential subset of Machine…

1 条评论
Titanic Machine Learning from Disaster

2023年12月21日

Titanic Machine Learning from Disaster

Abstract The Titanic Machine Learning from Disaster is a renowned data science project that involves predicting the…
Fashion MNIST Dataset Model Training to Classify the Class and Predict

2023年12月16日

Fashion MNIST Dataset Model Training to Classify the Class and Predict

Fashion MNIST Dataset Introduction. The Fashion-MNIST dataset is a dataset of Zalando's article images, with the…
The Impact of Neural Networks on Pathology

2023年12月3日

The Impact of Neural Networks on Pathology

Introduction What is Artificial Intelligence? It is the simulation of human intelligence processes by machines…

See all articles

Classification of Iris Dataset using Logistic Regression

Minidu Wickramaarachchi

AI Dev & Enthusiastic Coder

Iris Data Introduction

Methods

领英推荐

Code

Result Analysis

Minidu Wickramaarachchi的更多文章

社区洞察

其他会员也浏览了

Demystifying Data Science, Part IV: Models and Machine Learning

Some essential data science concepts from A to Z.

Ordinal Logistic Regression: A Practical Guide for Data Professionals

About Linear Regression

Week 15 Data Science Journey: Linear and Logistic Regression

John Elder Workshop at Predict Conference

Understanding Nonlinear Regression Estimation

Beyond Linear & Logistic Regression: A Gateway to Advanced Algorithms

Understanding Ridge and Lasso Regression: A Guide to Regularization in Machine Learning

Iris Data Introduction

Methods

领英推荐

Code

Result Analysis

Minidu Wickramaarachchi的更多文章

AI to Detects Brain Abnormalities and Cure Epilepsy

Enhancing NLP-Video Summarization for Legal Proceedings and Evidence Review with Watson and GPT

Leveraging VGG16 and Nearest Neighbors for Efficient Image Classification and Similarity Retrieval: A case study on Outdoor Place Recognition

Titanic Machine Learning from Disaster

Fashion MNIST Dataset Model Training to Classify the Class and Predict

The Impact of Neural Networks on Pathology

社区洞察

其他会员也浏览了

Demystifying Data Science, Part IV: Models and Machine Learning

Some essential data science concepts from A to Z.

Ordinal Logistic Regression: A Practical Guide for Data Professionals

About Linear Regression

Week 15 Data Science Journey: Linear and Logistic Regression

John Elder Workshop at Predict Conference

Understanding Nonlinear Regression Estimation

Beyond Linear & Logistic Regression: A Gateway to Advanced Algorithms

Understanding Ridge and Lasso Regression: A Guide to Regularization in Machine Learning