K- Means Clustering

K- Means Clustering

The k-means algorithm is a?clustering algorithm. That means that you have a bunch of points in some space, and you want to guess what groups they seem to be in. For example, say we have these points:


  o           ?
o oo          ?
  o o          ?
              ?
                oo ?
              oo   o?        


As a human, you can easily look at those and say that the ones in the top left are a cluster and the ones in the bottom right are a cluster, but if there were lots more clusters, or if they overlapped, or if they were in a 3-dimensional or much higher dimensional space, it would be harder.

With the k-means algorithm, you have to tell it how many clusters to look for (that's the "k"), and you tell it some real data points (like those o's in the diagram above), and then it tries to guess a reasonable grouping of the points into k clusters.

Here's basically?how it works:

  1. Start out with k made-up points. These will be your cluster centers, and you'll move them based on where the actual points are. These first made-up points can be random, or you can have some clever way of choosing them.
  2. For each of your cluster centers, find all the real data points that are closest to that center than to any of the other centers. Those points belong to that cluster (but this cluster might not be a very good guess yet).
  3. For each of the clusters you made in step 2 (there will be k of them, one for each cluster center), look at the points in the cluster, and find the average of them. This is your new center for that cluster (and you can throw away the old center). This new center is probably a better guess, because it's based on the actual data points.
  4. Repeat steps 2 and 3. You'll get a different thing because your cluster centers moved. Keep repeating steps 2 and 3, and eventually the cluster centers will stop moving. So now you have your guess about what the clusters are.

Depending on what points you started with, you might end up with a different guess for the clusters than if you had had different starting points. But you will always converge to something - there will never be a case where the cluster centers keep moving and never stop.

One limitation of the k-means algorithm is that it doesn't work well if the real clusters are very different sizes (some small ones and some big ones), or if they aren't very circular (for example if they're long and skinny).

Applications of k-means clustering:

  • Customer Segmentation: Subdivision of customers into groups/segments such that each customer segment consists of customers with similar market characteristics —?pricing , loyalty, spending behaviors?etc. Some of the segmentation variables could be e.g.,?number of items bought on sale, avg transaction value, total number of transactions. Customer segmentation allows businesses to customize market programs that will be suitable for each of its customer segments.
  • Anomaly or Fraud Detection:
  • Separate valid activity groups from bots
  • Detect fraudulent claims.
  • Inventory Categorization?based on sales or other manufacturing metrics
  • Creating NewsFeeds: K-Means can be used to cluster articles by their similarity — it can separate documents into disjoint clusters.
  • Cloud Computing Environment: Clustered storage to increase performance, capacity, or reliability — clustering distributes work loads to each server, manages the transfer of workloads between servers, and provides access to all files from any server regardless of the physical location of the file.
  • Environmental risks: K-means can be used to analyze environmental risk in an area — environmental risk zoning of a chemical industrial area.
  • Pattern Recognition in images:?For example, to automatically detect infected fruits or for segmentation of blood cells for leukemia detection.

Conclusion: Clustering Algorithms like K-Means are popular in almost every domain. It has got quite a lot of applications like Market Segmentation, Image Segmentation, Identifying Crime Localities, Recommendation Engines etc.

Thank Ypu

Hope this helps! :)

要查看或添加评论,请登录

Anuj Ramola的更多文章

社区洞察

其他会员也浏览了