Unsupervised does not mean automatic

A common misconception among business people (and maybe even some CS folks) is that clustering would be somehow "automatic learning". In reality, clustering typically requires one to set hyperparameters such as the number of cluster and the maximum/minimum distance between the clusters. These are typically tinkered with manually, so the process is not "automatic learning".

Clustering, however, is considered to be "unsupervised learning" because we don't use labeled data but try to create the labels by finding similarities between the datapoints. Therefore, the key thing is this: "unsupervised learning" is not "automatic learning". It's just learning from data that is not labeled [1]. And this learning typically requires manual human-made decisions to set the "correct" hyperparameters.

Sorry to bust your bubble; there is no "intelligence" that would magically solve all your problems!

Reference

[1] Adi Bronshtein: "Clustering is considered unsupervised learning, because there’s no labeled target variable in clustering. Clustering algorithms try to, well, cluster data points into similar groups (or… clusters) based on different characteristics of the data. In supervised learning, we have a labeled target variable we’re trying to predict, estimate (regression) or classify (classification)."

要查看或添加评论,请登录

Joni Salminen的更多文章

社区洞察

其他会员也浏览了