Clustering with K-Means
Angad Gupta ,MIEEE, BITS-Pilani
Renewable Energy | Clean Tech | DR | VPP| DERMS|EV
What is Clustering in Data Mining?
Clustering is the grouping of specific objects based on their characteristics and their similarities. As for data mining, this methodology divides the data that is best suited to the desired analysis using special join algorithms. This analysis allows an object not to be part or strictly part of a cluster, which is called the hard partitioning of this type. However, smooth partitions suggest that each object in the same degree belongs to a cluster. More specific divisions can be created like objects of multiple clusters, a single cluster can be forced to participate or even hierarchic trees can be constructed in group relations. This filesystem can be put into place in different ways based on various models. These Distinct Algorithms apply to each and every model, distinguishing their properties as well as their results. A good clustering algorithm is able to identify the cluster independent of cluster shape. There are 3 basic stages of clustering algorithm which are shown below
Methods of Clustering in Data Mining
The different methods of clustering in data mining are as explained below:
Clustering is an unsupervised learning
Clustering is a powerful machine learning tool for detecting structures in datasets. Unlike supervised methods, clustering is an unsupervised method that works on datasets in which there is no outcome (target) variable nor is anything known about the relationship between the observations, that is, unlabeled data.
Goal of Clustering
Clustering Algorithm:
Illustration of clustering by using Slearn inbuilt and very famous Iris dataset
Building and running the Model
here we defined 3 clusters
Plotting the output of Model in Scatter plots
Here we have plotted the output in 2 scatter plots graph, one id based on the Iris target variable and the second plot is based on the clustering labels, where we can see the labels are a mismatch
Relabeling and regenerating the plots
Now we can see that both scatter plots are looks similar
Evaluation of the clustering model
Here Precision: a measure of the model's relevancy and Recall: a measure of the model's completeness. High Precision + High Recall = Highly Accurate model results
here we can see that the clustered variable O is having 100% precision and Recall and which is very well clustered and variable 1 & 2 is also performed very well and reached above 70%
Overall model has done 83% accurate clustering
Strength and weakness of K-Means
#datascience #machinelearning #regression #multiple regression #MLR #python #statistics #statemodel #modeling #model interpretation #MLR #linearregression #learning #ml #datascience #datamodeloing #dataevalution #datavisualization #gupta #clusttering #k-means #unsupervisiedlearning #iris #learning #clusteringexample #slearn
You may also like to have a look
- Data Exploration using Pandas
- Data Visualization in Python (Different types of plots)
- Data Engineer Vs Data Analyst Vs Data Scientist
- Renewable Energy optimization with Big Data, Machine Learning, and Artificial Intelligence
- . Data processing with Python
- Linear regression
- Multiple Regression