Data distributions where K-means clustering fails; can DBSCAN be a solution? Examples with R, Python and Spark
For K-means clustering to work well the variance of the distribution of each attribute (variable) should be approximately spherical, all variables should have similar variance and each cluster should have roughly equal number of observations. Can DBSCAN be a solution for datasets that do not have the properties mentioned above?
Let's see examples with R, Python and Spark.
Article is available here
Associate Branch Head
6 年Nice. How would it compare with IsoData/IsoClus in which the algorithm adaptively adjusts final number of clusters, and one can specify heuristics such as minimum number of points to establish a cluster, maximum standard deviation, etc.?
Lecturer, Loyola University Chicago
6 年Interesting and useful.
NASA Climate Scientist | Consultant | Academic | Communicator | Children’s Author
6 年Interesting! I'd be curious to see the comparison with hierarchical clustering methods, particularly for applications like climate regionalization.?
Professor at Medical University of South Carolina
6 年Excellent demonstration! How would these procedures perform when all the items/variables are categorical (eg. Binary)? Looking forward to closely read your article!