Clustering with K-Means

Clustering with K-Means

What is Clustering in Data Mining?

Clustering is the grouping of specific objects based on their characteristics and their similarities. As for data mining, this methodology divides the data that is best suited to the desired analysis using special join algorithms. This analysis allows an object not to be part or strictly part of a cluster, which is called the hard partitioning of this type. However, smooth partitions suggest that each object in the same degree belongs to a cluster. More specific divisions can be created like objects of multiple clusters, a single cluster can be forced to participate or even hierarchic trees can be constructed in group relations. This filesystem can be put into place in different ways based on various models. These Distinct Algorithms apply to each and every model, distinguishing their properties as well as their results. A good clustering algorithm is able to identify the cluster independent of cluster shape. There are 3 basic stages of clustering algorithm which are shown below

No alt text provided for this image

Methods of Clustering in Data Mining

The different methods of clustering in data mining are as explained below:

No alt text provided for this image

Clustering is an unsupervised learning

Clustering is a powerful machine learning tool for detecting structures in datasets. Unlike supervised methods, clustering is an unsupervised method that works on datasets in which there is no outcome (target) variable nor is anything known about the relationship between the observations, that is, unlabeled data.

Goal of Clustering

No alt text provided for this image

Clustering Algorithm:

No alt text provided for this image
No alt text provided for this image
No alt text provided for this image

Illustration of clustering by using Slearn inbuilt and very famous Iris dataset

No alt text provided for this image
No alt text provided for this image

Building and running the Model

No alt text provided for this image

here we defined 3 clusters

Plotting the output of Model in Scatter plots

No alt text provided for this image

Here we have plotted the output in 2 scatter plots graph, one id based on the Iris target variable and the second plot is based on the clustering labels, where we can see the labels are a mismatch

Relabeling and regenerating the plots

No alt text provided for this image

Now we can see that both scatter plots are looks similar

Evaluation of the clustering model

No alt text provided for this image

Here Precision: a measure of the model's relevancy and Recall: a measure of the model's completeness. High Precision + High Recall = Highly Accurate model results

here we can see that the clustered variable O is having 100% precision and Recall and which is very well clustered and variable 1 & 2 is also performed very well and reached above 70%

Overall model has done 83% accurate clustering

Strength and weakness of K-Means

No alt text provided for this image

#datascience #machinelearning #regression #multiple regression #MLR #python #statistics #statemodel #modeling #model interpretation #MLR #linearregression #learning #ml #datascience #datamodeloing #dataevalution #datavisualization #gupta #clusttering #k-means #unsupervisiedlearning #iris #learning #clusteringexample #slearn

You may also like to have a look

  1. Data Exploration using Pandas
  2. Data Visualization in Python (Different types of plots)
  3. Data Engineer Vs Data Analyst Vs Data Scientist
  4. Renewable Energy optimization with Big Data, Machine Learning, and Artificial Intelligence
  5. Data processing with Python
  6. Linear regression
  7. Multiple Regression








要查看或添加评论,请登录

Angad Gupta ,MIEEE, BITS-Pilani的更多文章

社区洞察

其他会员也浏览了