Cluster Analysis
The Data Mining (KDD)

Cluster Analysis

What is Cluster Analysis?

Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to) the objects in other groups.

  • Based on information found in the data that describes the objects and their relationships.
  • Also known as unsupervised classification.

Many applications

  • Understanding: group related documents for browsing or to find genes and proteins that have similar functionality.
  • Summarization: Reduce the size of large data sets.

Web Documents are divided into groups based on a similarity metric.

  • Most common similarity metric is the dot product between two document vectors.


What is not Cluster Analysis?

Supervised classification.

  • Have class label information.

Simple segmentation.

  • Dividing students into different registration groups alphabetically, by last name.

Results of a query.

  • Groupings are a result of an external specification.

Graph partitioning

  • Some mutual relevance and synergy, but areas are not identical.


Types of Clusterings

A clustering is a set of clusters.

One important distinction is between hierarchical and partitional sets of clusters.

Partitional Clustering

  • A division data objects into non-overlapping subsets (clusters) such that each data object is in exactly one subset.

Hierarchical clustering

  • A set of nested clusters organized as a hierarchical tree.
DIAW Serigne
Data Engineer, Data Scientist at Business and Decision
paper : https://www.ieee.org.ar/downloads/Srivastava-tut-pres.pdf


要查看或添加评论,请登录

Serigne DIAW的更多文章

  • CRM & Big Data Analytics

    CRM & Big Data Analytics

    What is this Big Data everyone keeps talking about these days? Big Data refers to the huge volumes of data being…

社区洞察

其他会员也浏览了