登录查看更多内容

K- Means Clustering

Anuj Ramola

DevSecOps Kubernetes | Docker | Terraform | Ansible | Prometheus | Graffana | CI/CD

发布日期: 2021年8月21日

The k-means algorithm is a?clustering algorithm. That means that you have a bunch of points in some space, and you want to guess what groups they seem to be in. For example, say we have these points:

  o           ?
o oo          ?
  o o          ?
              ?
                oo ?
              oo   o?

As a human, you can easily look at those and say that the ones in the top left are a cluster and the ones in the bottom right are a cluster, but if there were lots more clusters, or if they overlapped, or if they were in a 3-dimensional or much higher dimensional space, it would be harder.

With the k-means algorithm, you have to tell it how many clusters to look for (that's the "k"), and you tell it some real data points (like those o's in the diagram above), and then it tries to guess a reasonable grouping of the points into k clusters.

Here's basically?how it works:

Start out with k made-up points. These will be your cluster centers, and you'll move them based on where the actual points are. These first made-up points can be random, or you can have some clever way of choosing them.
For each of your cluster centers, find all the real data points that are closest to that center than to any of the other centers. Those points belong to that cluster (but this cluster might not be a very good guess yet).
For each of the clusters you made in step 2 (there will be k of them, one for each cluster center), look at the points in the cluster, and find the average of them. This is your new center for that cluster (and you can throw away the old center). This new center is probably a better guess, because it's based on the actual data points.
Repeat steps 2 and 3. You'll get a different thing because your cluster centers moved. Keep repeating steps 2 and 3, and eventually the cluster centers will stop moving. So now you have your guess about what the clusters are.

领英推荐

Understanding the Semantic Power of RDF: A Pedagogical…

Nicolas Figay 1 个月前

Why are vector databases now a hot topic?

Abhishek Soni 1 年前

K-nearest neighbor Classification(KNN)

Bluechip Technologies Asia 10 个月前

Depending on what points you started with, you might end up with a different guess for the clusters than if you had had different starting points. But you will always converge to something - there will never be a case where the cluster centers keep moving and never stop.

One limitation of the k-means algorithm is that it doesn't work well if the real clusters are very different sizes (some small ones and some big ones), or if they aren't very circular (for example if they're long and skinny).

Applications of k-means clustering:

Customer Segmentation: Subdivision of customers into groups/segments such that each customer segment consists of customers with similar market characteristics —?pricing , loyalty, spending behaviors?etc. Some of the segmentation variables could be e.g.,?number of items bought on sale, avg transaction value, total number of transactions. Customer segmentation allows businesses to customize market programs that will be suitable for each of its customer segments.
Anomaly or Fraud Detection:
Separate valid activity groups from bots
Detect fraudulent claims.
Inventory Categorization?based on sales or other manufacturing metrics
Creating NewsFeeds: K-Means can be used to cluster articles by their similarity — it can separate documents into disjoint clusters.
Cloud Computing Environment: Clustered storage to increase performance, capacity, or reliability — clustering distributes work loads to each server, manages the transfer of workloads between servers, and provides access to all files from any server regardless of the physical location of the file.
Environmental risks: K-means can be used to analyze environmental risk in an area — environmental risk zoning of a chemical industrial area.
Pattern Recognition in images:?For example, to automatically detect infected fruits or for segmentation of blood cells for leukemia detection.

Conclusion: Clustering Algorithms like K-Means are popular in almost every domain. It has got quite a lot of applications like Market Segmentation, Image Segmentation, Identifying Crime Localities, Recommendation Engines etc.

Thank Ypu

Hope this helps! :)

要查看或添加评论，请登录

Anuj Ramola的更多文章

TLS CERTIFICATES FOR NEW KUBERNETES ADMIN !

2025年1月27日

TLS CERTIFICATES FOR NEW KUBERNETES ADMIN !

"YOU are now Kubernetes admin for managing the Kubernetes cluster. How are you authenticated to the different services…
Project part A: CI (Continuous Integration ) Using AWS cloud

2023年8月12日

Project part A: CI (Continuous Integration ) Using AWS cloud

AWS CodeCommit & CodeBuild Automate Development Lifecycle
Java Script use cases :

2021年6月26日

Java Script use cases :

What is javascript? JavaScript is a scripting or programming language that allows you to implement complex features on…
Task 05 ??????

2021年6月6日

Task 05 ??????

CONFUSION MATRIX : The confusion matrix was invented in 1904 by Karl Pearson. He used the term Contingency A confusion…
Task on basic_image_operations

2021年6月2日

Task on basic_image_operations

Task details : Task 4.1 ?? Create image by yourself Using Python Code ?? Task 4.
TASK:02

2021年5月31日

TASK:02

Task Description ?? ?? GUI container on the Docker ?? Launch a container on docker in GUI mode ?? Run any GUI software…
Machine Learning Models deployment inside Docker Containers ??

2021年5月29日

Machine Learning Models deployment inside Docker Containers ??

Machine learning = Machine learning (ML) is a branch of artificial intelligence (AI) that focuses on building…
Task 7.1

2020年11月22日

Task 7.1

?? 7.1: Elasticity Task A-Integrating LVM with Hadoop and providing Elasticity to DataNode Storage.
Arth learner:Task :06 ?? Create High Availability Architecture with AWS CLI ??

2020年11月14日

Arth learner:Task :06 ?? Create High Availability Architecture with AWS CLI ??

Before starting the task let me state you the outline: Things we have to do: In command prompt of windows operating…
Task :05 "To explore the benefits which multinational companies are getting from AL/ML".

2020年10月26日

Task :05 "To explore the benefits which multinational companies are getting from AL/ML".

Lets discuss about some online platforms ,using this advance technology of ML/AI. 1.

See all articles

K- Means Clustering

Anuj Ramola

DevSecOps Kubernetes | Docker | Terraform | Ansible | Prometheus | Graffana | CI/CD

领英推荐

Applications of k-means clustering:

Anuj Ramola的更多文章

社区洞察

其他会员也浏览了

K-nearest neighbor Classification(KNN)

Unlocking Insights from Timeline Data Using Regression Modeling

Stuck in the Muck: Big Data means Big Problems

Introduction to Group Feature Selection

KNN Classification: A Beginner's Guide

DBSCAN (Density-Based Spatial Clustering of Applications with Noise)

Datasets/ Data Sources and where to find them, ????.

Master THE FIVE SORTING ALGORITHMS in 5 Minutes A Day

Data Culture and Economy of Algorithms

What Are Knowledge Graphs? Your Gateway to Understanding Linked Data!

领英推荐

Applications of k-means clustering:

Anuj Ramola的更多文章

TLS CERTIFICATES FOR NEW KUBERNETES ADMIN !

Project part A: CI (Continuous Integration ) Using AWS cloud

Java Script use cases :

Task 05 ??????

Task on basic_image_operations

TASK:02

Machine Learning Models deployment inside Docker Containers ??

Task 7.1

Arth learner:Task :06 ?? Create High Availability Architecture with AWS CLI ??

Task :05 "To explore the benefits which multinational companies are getting from AL/ML".

社区洞察

其他会员也浏览了

K-nearest neighbor Classification(KNN)

Unlocking Insights from Timeline Data Using Regression Modeling

Stuck in the Muck: Big Data means Big Problems

Introduction to Group Feature Selection

KNN Classification: A Beginner's Guide

DBSCAN (Density-Based Spatial Clustering of Applications with Noise)

Datasets/ Data Sources and where to find them, ????.

Master THE FIVE SORTING ALGORITHMS in 5 Minutes A Day

Data Culture and Economy of Algorithms

What Are Knowledge Graphs? Your Gateway to Understanding Linked Data!