Machine Learning 8: 'Clustering Algorithms'

Machine Learning 8: 'Clustering Algorithms'

In the last week, we explored classification and Random Forest algorithm and that was a part of Supervised Machine Learning which also consists of regression analysis and predictive modelling. There is another type of Machine Learning algorithm which are known as Unsupervised Machine Learning algorithms. In this week, we will explore unsupervised Machine Learning algorithms such as Clustering.

Supervised Learning

Machine learning can be categorized as supervised and unsupervised machine learning. Some of the well know supervised machine learning algorithms are SVM (Support Vector Machine), Linear Regression, Neural Network, Naive Bayes. In supervised learning, the training data is labelled, that means we already know the target variable we are going to predict while we test the model.

Unsupervised Classification

In unsupervised learning, the training data is unlabeled and the system tries to learn without a trainer. Some of the most important unsupervised algorithms are clustering, k-means, Association rule learning etc.

What Is Clustering?

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learningpattern recognitionimage analysisinformation retrievalbioinformaticsdata compression, and computer graphics.

Clustering is widely used in marketing to find naturally occurring groups of customers with similar characteristics, resulting in customer segmentation that more accurately depicts and predicts customer behavior, leading to more personalized sales and customer service efforts.

There are a lot of clustering algorithms each serving a specific purpose and having its own use cases. To look out clustering and it definition in a deeper aspect, here are a few links that you can go through as well.

What is Clustering in Data Mining?

Data Mining - Cluster Analysis

Clustering in Data Mining

Data Mining Concepts

How Businesses Can Use Clustering in Data Mining

Numerous Clustering techniques work best for different types of data. Let’s assume that your data is a numeric and continuous two-dimensional data as shown in figure below in form of a scatter plot.


This another scatter plot is created from several "blobs" of different sizes and shapes shws the clusters that exists in the data


We will discuss a few Clustering algorithms which are Kmeans, Hierarchical Clustering.


K-means

 


You might be thinking that how do I decide the value of K in the first step.

One of the methods is called Elbow method can be used to decide an optimal number of clusters. Here you would run K-mean clustering on a range of K values and plot the “percentage of variance explained” on the Y-axis and “K” on X-axis as shown in the figure below. As we add more clusters after 3 it doesn't affect the variance explained.


Here is another link for you to explore the same.



Hierarchical Clustering

Unlike K-mean clustering, Hierarchical clustering starts by assigning all data points as their own cluster building the hierarchy and it combines the two nearest data point and merges it together to one cluster as shown in the Dendrogram below.


More Algorithms to Learn

§ Mean-Shift Clustering

§ Expectation–Maximization (EM) Clustering using Gaussian Mixture Models (GMM)

§ Density-Based Spatial Clustering of Applications with Noise (DBSCAN)

More resources for this week:

§ The 5 Clustering Algorithms Data Scientists Need to Know

§ As for the practise for this week, you have to implement all the clustering algorithms available in Sklearn on these two Kaggle datasets.

§ Breast Cancer Wisconsin (Diagnostic) Data Set

§ World Happiness Report


Special thanks to Anuja Nagpal: Link - https://towardsdatascience.com/clustering-unsupervised-learning-788b215b074b

Chris Surdak

Chris Surdak: Digital Transformation, Artificial Intelligence, Cybersecurity and Blockchain Executive

6 年

Fabulous mathematics... but... as Forrest Gump used to say, “stupid is as stupid does.” What few in #RPA or #AI care to discuss is the fact that crappy inputs lead to horrendous results. Automation just gets you there faster.

Arturo I.

Technical Project Manager

6 年

Did you learn the k-means? :P

要查看或添加评论,请登录

Shivam Panchal的更多文章

  • Best Resources for Data Science Enthusiasts- A Complete List

    Best Resources for Data Science Enthusiasts- A Complete List

    Free Books R Python Libraries Libraries for Python Libraries for R Complete Beginner Resources ML, DL and RL in Python…

  • Machine Learning, Deep Learning and Artificial Intelligence Resources for all

    Machine Learning, Deep Learning and Artificial Intelligence Resources for all

    Here is a bunch of machine learning resources, thought I'd share it here. ★ are resources that were highly recommended…

    1 条评论
  • Machine Learning 10: 'Recommendation System'

    Machine Learning 10: 'Recommendation System'

    Why do the we care about the Recommendation Systems? The answer to this question may be different based on different…

  • Machine Learning 9: 'Sequential Rule Mining'

    Machine Learning 9: 'Sequential Rule Mining'

    Sequential Rule Mining is a data mining technique which consists of discovering rules in sequences. Sequential Rule…

    4 条评论
  • Machine Learning 7:'Classification' Day 3

    Machine Learning 7:'Classification' Day 3

    In the last post, I discussed about Decision Tree. In this post, I will be discussing about Random Forest Algorithm…

    9 条评论
  • Machine Learning 6:'Classification' Day 2

    Machine Learning 6:'Classification' Day 2

    Keep asking yes/no questions. With each question continue to significantly narrow down the space of possibly secrets.

    6 条评论
  • Machine Learning : 'Classification' - Day 1

    Machine Learning : 'Classification' - Day 1

    In this post, we are starting off the classification, firstly, we will get into the difference between classification…

    17 条评论
  • Machine Learning : 'Regression' - Day 4

    Machine Learning : 'Regression' - Day 4

    In this post which will be the last one on regression analysis, I will be discussing about the following topics in…

    3 条评论
  • Machine Learning : 'Regression' - Day 3

    Machine Learning : 'Regression' - Day 3

    In the last to last post, we discussed about what is Regression and in the last one, we talked about the assumptions or…

    9 条评论
  • Machine Learning : 'Regression' - Day 2

    Machine Learning : 'Regression' - Day 2

    Welcome to the post, I will not bore you much with the theory behind, I will try to put it as easy as possible. In this…

    3 条评论

社区洞察

其他会员也浏览了