登录查看更多内容

K-Mean Clustering

Jagadananda Saint

SDE 3 @ Mareana, Inc. | Python

发布日期: 2020年2月16日

Clustering is one of the most common exploratory data analysis techniques used to get an intuition about the structure of the data. It can be defined as the task of identifying subgroups in the data such that data points in the same subgroup (cluster) are very similar while data points in different clusters are very different.

Kmeans algorithm is an iterative algorithm that tries to partition the dataset into Kpre-defined distinct non-overlapping subgroups (clusters) where each data point belongs to only one group. It tries to make the inter-cluster data points as similar as possible while also keeping the clusters as different (far) as possible.

It assigns data points to a cluster such that the sum of the squared distance between the data points and the cluster’s centroid (arithmetic mean of all the data points that belong to that cluster) is at the minimum. The less variation we have within clusters, the more homogeneous (similar) the data points are within the same cluster.

The way the K-means algorithm works is as follows:

Specify the number of clusters K.
Initialize centroids by first shuffling the dataset and then randomly selecting K data points for the centroids without replacement.
Keep iterating until there is no change to the centroids. i.e assignment of data points to clusters isn’t changing.
Compute the sum of the squared distance between data points and all centroids. Assign each data point to the closest cluster (centroid).
Compute the centroids for the clusters by taking the average of all data points that belong to each cluster.

In this series, you will learn how to create a K-Mean clustering model and Create Clusters to analyze your data. Hope you have enjoyed learning this if so, share this with others and for more such contents you can connect with me on

YouTube: https://www.youtube.com/channel/UCmF8qppe02J1ot4Jfwl_lFg

LinkedIn: https://www.dhirubhai.net/in/jagwithyou/

Medium: https://medium.com/@jagwithyou

GitHub: https://github.com/explorewithjag

要查看或添加评论，请登录

Jagadananda Saint的更多文章

Neural Network

2020年2月17日

Neural Network

Neural networks are a set of algorithms, that are designed to recognize patterns. They interpret sensory data through a…
Logistic Regression with Diabetic Detector Website?Project

2020年2月15日

Logistic Regression with Diabetic Detector Website?Project

Logistic Regression is a Machine Learning algorithm which is used for the classification problems, it is a predictive…
Linear Regression With Sales Prediction Project

2020年2月15日

Linear Regression With Sales Prediction Project

Linear Regression is a machine learning algorithm based on supervised learning. It performs a regression task.
Deploying Django Application On AWS Cloud!

2020年2月15日

Deploying Django Application On AWS Cloud!

New research by data management company Veritas has found out that more than 70% of organizations desire to eventually…
Part 6: Integrating SSL(https) with Django website

2020年2月15日

Part 6: Integrating SSL(https) with Django website

As a final step of our deployment, we are going to make our website more secure by adding an ssh certificate to it. For…
Part 5: Registering External Domain With AWS EC2 using Route53

2020年2月15日

Part 5: Registering External Domain With AWS EC2 using Route53

To access EC2 instance from a domain we have to link EC2 with the domain. For linking both the domains we have to use…
Part 4 : Creating and Configuring RDS With the Django Application

2020年2月15日

Part 4 : Creating and Configuring RDS With the Django Application

If your Django application using any database then you need to configure a production database. In this part, we are…
Part3 : Configuring Django Static files with Nginx

2020年2月15日

Part3 : Configuring Django Static files with Nginx

The first question in your mind should be why the static files are not working. Although it was working on your system.
Part 2: Installing the required libraries & Integrating Django application with EC2 Instance:

2020年2月15日

Part 2: Installing the required libraries & Integrating Django application with EC2 Instance:

In the last part, we have created an Instance for deploying the Django application let’s take remote of the instance on…
Machine Learning Algorithms

2020年2月15日

Machine Learning Algorithms

ML algorithms are those that can learn from data and improve from experience, without human intervention. According to…

See all articles

K-Mean Clustering

Jagadananda Saint

SDE 3 @ Mareana, Inc. | Python

Jagadananda Saint的更多文章

社区洞察

其他会员也浏览了

Journey of Data, depicted as Story

Understanding Probability Distributions in Data Science: PDF, PMF, and CDF

Dealing with Erratic Data in Time Series Forecasting: Strategies and Algorithms

Very Simple Example of Using Data Science in Real-Life Situation (Real-Time Scenario)

22 tips for better data science

From Historical Data to Future Insights: Building Time Series Models with Low-Code Tools

Mastering the Top 10 Statistical Concepts: The Key to Success in Data Science

Demystifying Data Science

Cluster Analysis: Grouping Data for Better Insights

Jagadananda Saint的更多文章

Neural Network

Logistic Regression with Diabetic Detector Website?Project

Linear Regression With Sales Prediction Project

Deploying Django Application On AWS Cloud!

Part 6: Integrating SSL(https) with Django website

Part 5: Registering External Domain With AWS EC2 using Route53

Part 4 : Creating and Configuring RDS With the Django Application

Part3 : Configuring Django Static files with Nginx

Part 2: Installing the required libraries & Integrating Django application with EC2 Instance:

Machine Learning Algorithms

社区洞察

其他会员也浏览了

Journey of Data, depicted as Story

Understanding Probability Distributions in Data Science: PDF, PMF, and CDF

Dealing with Erratic Data in Time Series Forecasting: Strategies and Algorithms

Very Simple Example of Using Data Science in Real-Life Situation (Real-Time Scenario)

22 tips for better data science

From Historical Data to Future Insights: Building Time Series Models with Low-Code Tools

Mastering the Top 10 Statistical Concepts: The Key to Success in Data Science

Demystifying Data Science

Cluster Analysis: Grouping Data for Better Insights