Data distributions where K-means clustering fails; can DBSCAN be a solution? Examples with R, Python and Spark

Data distributions where K-means clustering fails; can DBSCAN be a solution? Examples with R, Python and Spark

For K-means clustering to work well the variance of the distribution of each attribute (variable) should be approximately spherical, all variables should have similar variance and each cluster should have roughly equal number of observations. Can DBSCAN be a solution for datasets that do not have the properties mentioned above?

Let's see examples with R, Python and Spark.

Article is available here


Nice. How would it compare with IsoData/IsoClus in which the algorithm adaptively adjusts final number of clusters, and one can specify heuristics such as minimum number of points to establish a cluster, maximum standard deviation, etc.?

回复
Debjani Ghatak

Lecturer, Loyola University Chicago

6 年

Interesting and useful.

回复
Amin Dezfuli

NASA Climate Scientist | Consultant | Academic | Communicator | Children’s Author

6 年

Interesting! I'd be curious to see the comparison with hierarchical clustering methods, particularly for applications like climate regionalization.?

Mulugeta Gebregziabher

Professor at Medical University of South Carolina

6 年

Excellent demonstration! How would these procedures perform when all the items/variables are categorical (eg. Binary)? Looking forward to closely read your article!

要查看或添加评论,请登录

Fisseha Berhane, PhD的更多文章

社区洞察

其他会员也浏览了