登录查看更多内容

Machine Learning 8: 'Clustering Algorithms'

Shivam Panchal

Data Scientist | Machine Learning Engineer

发布日期: 2018年6月7日

In the last week, we explored classification and Random Forest algorithm and that was a part of Supervised Machine Learning which also consists of regression analysis and predictive modelling. There is another type of Machine Learning algorithm which are known as Unsupervised Machine Learning algorithms. In this week, we will explore unsupervised Machine Learning algorithms such as Clustering.

Supervised Learning

Machine learning can be categorized as supervised and unsupervised machine learning. Some of the well know supervised machine learning algorithms are SVM (Support Vector Machine), Linear Regression, Neural Network, Naive Bayes. In supervised learning, the training data is labelled, that means we already know the target variable we are going to predict while we test the model.

Unsupervised Classification

In unsupervised learning, the training data is unlabeled and the system tries to learn without a trainer. Some of the most important unsupervised algorithms are clustering, k-means, Association rule learning etc.

What Is Clustering?

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics.

Clustering is widely used in marketing to find naturally occurring groups of customers with similar characteristics, resulting in customer segmentation that more accurately depicts and predicts customer behavior, leading to more personalized sales and customer service efforts.

There are a lot of clustering algorithms each serving a specific purpose and having its own use cases. To look out clustering and it definition in a deeper aspect, here are a few links that you can go through as well.

What is Clustering in Data Mining?

Data Mining - Cluster Analysis

Clustering in Data Mining

Data Mining Concepts

How Businesses Can Use Clustering in Data Mining

Numerous Clustering techniques work best for different types of data. Let’s assume that your data is a numeric and continuous two-dimensional data as shown in figure below in form of a scatter plot.

This another scatter plot is created from several "blobs" of different sizes and shapes shws the clusters that exists in the data

We will discuss a few Clustering algorithms which are Kmeans, Hierarchical Clustering.

K-means

You might be thinking that how do I decide the value of K in the first step.

One of the methods is called Elbow method can be used to decide an optimal number of clusters. Here you would run K-mean clustering on a range of K values and plot the “percentage of variance explained” on the Y-axis and “K” on X-axis as shown in the figure below. As we add more clusters after 3 it doesn't affect the variance explained.

Here is another link for you to explore the same.

Hierarchical Clustering

Unlike K-mean clustering, Hierarchical clustering starts by assigning all data points as their own cluster building the hierarchy and it combines the two nearest data point and merges it together to one cluster as shown in the Dendrogram below.

More Algorithms to Learn

§ Mean-Shift Clustering

§ Expectation–Maximization (EM) Clustering using Gaussian Mixture Models (GMM)

§ Density-Based Spatial Clustering of Applications with Noise (DBSCAN)

More resources for this week:

§ The 5 Clustering Algorithms Data Scientists Need to Know

§ As for the practise for this week, you have to implement all the clustering algorithms available in Sklearn on these two Kaggle datasets.

§ Breast Cancer Wisconsin (Diagnostic) Data Set

§ World Happiness Report

Special thanks to Anuja Nagpal: Link - https://towardsdatascience.com/clustering-unsupervised-learning-788b215b074b

Chris Surdak

Chris Surdak: Digital Transformation, Artificial Intelligence, Cybersecurity and Blockchain Executive

6 年

Fabulous mathematics... but... as Forrest Gump used to say, “stupid is as stupid does.” What few in #RPA or #AI care to discuss is the fact that crappy inputs lead to horrendous results. Automation just gets you there faster.

1 次回应

Arturo I.

Technical Project Manager

6 年

Did you learn the k-means? :P

1 次回应

查看更多评论

要查看或添加评论，请登录

Shivam Panchal的更多文章

Best Resources for Data Science Enthusiasts- A Complete List

2020年6月20日

Best Resources for Data Science Enthusiasts- A Complete List

Free Books R Python Libraries Libraries for Python Libraries for R Complete Beginner Resources ML, DL and RL in Python…
Machine Learning, Deep Learning and Artificial Intelligence Resources for all

2020年6月15日

Machine Learning, Deep Learning and Artificial Intelligence Resources for all

Here is a bunch of machine learning resources, thought I'd share it here. ★ are resources that were highly recommended…

1 条评论
Machine Learning 10: 'Recommendation System'

2018年7月18日

Machine Learning 10: 'Recommendation System'

Why do the we care about the Recommendation Systems? The answer to this question may be different based on different…
Machine Learning 9: 'Sequential Rule Mining'

2018年6月24日

Machine Learning 9: 'Sequential Rule Mining'

Sequential Rule Mining is a data mining technique which consists of discovering rules in sequences. Sequential Rule…

4 条评论
Machine Learning 7:'Classification' Day 3

2018年3月24日

Machine Learning 7:'Classification' Day 3

In the last post, I discussed about Decision Tree. In this post, I will be discussing about Random Forest Algorithm…

9 条评论
Machine Learning 6:'Classification' Day 2

2018年3月14日

Machine Learning 6:'Classification' Day 2

Keep asking yes/no questions. With each question continue to significantly narrow down the space of possibly secrets.

6 条评论
Machine Learning : 'Classification' - Day 1

2018年3月9日

Machine Learning : 'Classification' - Day 1

In this post, we are starting off the classification, firstly, we will get into the difference between classification…

17 条评论
Machine Learning : 'Regression' - Day 4

2018年3月2日

Machine Learning : 'Regression' - Day 4

In this post which will be the last one on regression analysis, I will be discussing about the following topics in…

3 条评论
Machine Learning : 'Regression' - Day 3

2018年2月28日

Machine Learning : 'Regression' - Day 3

In the last to last post, we discussed about what is Regression and in the last one, we talked about the assumptions or…

9 条评论
Machine Learning : 'Regression' - Day 2

2018年2月25日

Machine Learning : 'Regression' - Day 2

Welcome to the post, I will not bore you much with the theory behind, I will try to put it as easy as possible. In this…

3 条评论

See all articles

Machine Learning 8: 'Clustering Algorithms'

Shivam Panchal

Data Scientist | Machine Learning Engineer

More Algorithms to Learn

More resources for this week:

Shivam Panchal的更多文章

社区洞察

其他会员也浏览了

Clustering

Common machine Learning Algorithms

Unleashing the Power of Machine Learning Algorithms: A Comprehensive Guide

Machine Learning Algorithms: A Deep Dive into Key Techniques

Breaking Down Machine Learning Algorithms: A Beginner’s Guide to Linear Regression

Unleashing the Power of Big Data: A Comprehensive Look at Machine Learning Algorithms

Unsupervised vs. supervised machine learning: What business leaders should know

Introduction to Advanced Predictive Analytics

Introduction to Advanced Predictive Analytics