登录查看更多内容

K-Nearest Neighbors Algorithm: Technical application of the phrase, "Birds of a feather flock together"

Dr. Sheetal Sippy

Program Strategy-Life Sciences and Health Care, Deloitte India (Offices of the US)

发布日期: 2020年10月19日

We often tend to group individuals depicting certain traits/living in similar vicinities together. KNN model works just like our common intuition. It is a supervised machine learning algorithm often used in classification problems. KNN algorithm classifies the data points based on how the neighboring data is classified. Pause! Let us unpack that.

Breaking it down

The supervised learning algorithm depends on labeled input data to learn a function that produces an output when given new unlabeled data. An Unsupervised machine-learning algorithm uses input data without any labels and learns a function that enables users to make predictions. This depends on the basic structure of the data to generate more insights.

K Nearest Neighbors

KNN algorithm classifies the data points based on how the neighboring data is classified. It is simple to implement and in certain cases is a predecessor to/benchmark for more complicated classifiers like Artificial Neural Networks (ANN) and Support Vector Machines (SVM). KNN is a lazy learning algorithm i.e. it memorizes the training dataset and does not have a training phase. Historic and raw data is used from the public domain to draw predictions hence, KNN is also known as, “Case-based learning algorithm.” KNN is a result of research conducted for the armed forces. Fix and Hodge, two officers of USAF School of Aviation wrote a technical report introducing the KNN algorithm in 1951. They introduced this approach to nonparametric classification by relying on the ‘distance’ between points or distributions.

Let us take an example to understand how this works by taking a trip to Karen’s new pet store called Pawsome. Since it is a new store, Karen is trying to engage with her customers through several promotional activities. She anticipates a high footfall on the weekend and lays out dog treats in the form of a puzzle.

The treats were kept on display in the following manner and some of them were hidden with a cloth. She is willing to offer a discounted price on the treats to the ones who solve can guess which shape category the hidden treats belong to. You can now take a minute and try if you get it all right. If you are unable to classify, you can mark it as, “Unsure”

Once you’ve predicted, let’s cross-check it with that of below:

1 & 2: Circular treats

3: Unsure if it is a Circular treat or a bone-shaped treat

4 & 5: Bone-shaped treat 6 & 7: Unsure if it is a bone-shaped treat or a heart-shaped treat

8: Heart-shaped treat

If you have guessed it right, you have implemented KNN!

In the image, we can see that similar treats are arranged together. 1, 2 can be easily classified as, “Circular treats” as they are completely surrounded by them and there is a higher probability that the hidden ones can be circular too. The same logic applies to 4, 5, and 8. Therefore, one can conclude that the hidden ones will mostly be the same type as that of their neighbors. Classification of 3 is tricky as its neighbors are circular treats and bone-shaped ones. There is a similar dilemma for 6 and 7. Hence, 6 and 7 can either be a bone-shaped treat or a heart-shaped one. From this, it is safe to say that the KNN algorithm predicts the label for a new point on the basis of the label of its neighbors. Through the above example, based on the label (Circular treat, Bone-shaped teat, and Heart-shaped treat) of the neighbors we can classify the new data point. (Hidden treat)

Understanding “K”

K is the number of nearest neighbors considered for predicting. It assigns a label to the point in concern. For example, if K=5, we consider 5 nearest points and use the label of the majority of these 6 points as the predicted label.

Considering the above example, the aim now is to assign a label to the hidden point 7.

If we consider K=3, we notice that the neighboring points of 7 are heart-shaped treats and 1 bone-shaped treat and if we consider K=5, 3 out of 5 neighboring points are the bone-shaped treat but how can we go about labeling the hidden point?

To classify the hidden data point and to assign the label the distance between the hidden data point and the neighboring points is calculated by leveraging mathematical functions called Euclidean Distance (Most common distance metric), Chebyshev distance, and Manhattan Distance.

It is now clear that the classification varies based on the K value. Choosing an accurate K value is an important parameter when working with this algorithm. This process of choosing an appropriate value is known as, “Parameter Tuning.” Choosing the value of K depends on individual datasets and the best method of selecting the value is to try different values of K to validate the outcomes. A value too small can increase the probability of overfitting the model and a large value can lead to the process being computationally expensive.

Some applications of the KNN Model

1. KNN is widely used by E-commerce and OTT Platforms. KNN is leveraged for recommending products, media to consume, and advertisements. For instance, if one purchases a smartphone from Amazon, recommendations for mobile accessories like covers, earphones, etc start surfacing

2. KNN techniques are often used for theft prevention in the modern retail business. Through KNN, it is easier to recognize patterns to scan and detect hidden packages in the bottom of the shopping cart at the check-out. If an object is detected that matches the item on the existing database, then the price of this spotted product is added to the customer’s bill

3. Another relevant usage of this algorithm in the retail industry is identifying patterns in credit card usage. Most transaction scrutinizing software use KNN to detect any unusual/suspicious activities

4. Certain advanced applications of KNN include handwriting detection, voice and image recognition

Applications of KNN in Healthcare

Medical data is extremely robust and continues multiple features. These records are huge resource banks for medical research. Medical data contains several patterns and relationships that can aid in enhancing the accuracy of diagnostic processes. Several research studies are being performed all over the globe to classify medical data based on KNN algorithms. The algorithm can be used to classify and predict the diagnosis of several diseases that present similar symptoms and multiple hypotheses have been tested for the following variables: allergies, age, blood pressure, diabetes, cholesterol, etc based on historical data.

Advantages and Limitations of the algorithm

To summarize, the K-nearest neighbor algorithm is a simple classification technique with a wide array of applications. Despite the simplicity, it can provide competitive results and can be used for regression models. Classification is based on occurrence and does not require one to develop an abstract model from a training data set. Although the classification process could be computationally expensive hence, it has room for improvement and modification

Thank you for reading this piece, feel free to drop in a comment or reach out to me on [email protected] :)

References

1. Gupta, S. (2019, May 29). KNN Machine Learning Algorithm Explained. Retrieved October 19, 2020, from https://in.springboard.com/blog/knn-machine-learning-algorithm-explained/

https://in.springboard.com/blog/knn-machine-learning-algorithm-explained/

2. Https://lyfat.wordpress.com/2012/05/22/euclidean-vs-chebyshev-vs-manhattan-distance/. (n.d.).

3. M’Haimdat, O. (2020, May 12). Understand the Fundamentals of the K-Nearest Neighbors (KNN) Algorithm. Retrieved October 19, 2020, from https://heartbeat.fritz.ai/understand-the-fundamentals-of-the-k-nearest-neighbors-knn-algorithm-533dc0c2f45a

4. Medical Health Big Data Classification Based on KNN Classification Algorithm. (n.d.). Retrieved October 19, 2020, from https://ieeexplore.ieee.org/document/8911389

5. Schott, M. (2020, February 27). K-Nearest Neighbors (KNN) Algorithm for Machine Learning. Retrieved October 19, 2020, from https://medium.com/capital-one-tech/k-nearest-neighbors-knn-algorithm-for-machine-learning-e883219c8f26

Neil Mirchandani

ELT IT and Intune Engineering Operations at Bausch + Lomb

4 年

Very well written, Sheetal.

Bhagwati Prasad

CEO at Koita Centre for Digital Diabetology-RSSDI

4 年

Very insightful Dr. Sheetal Sippy

Bharat Raj Navani

4 年

Great work Dr. Sheetal!

Siddharth Bhat, MBA

Business Strategy | Life-sciences | Direct To Patient Care |

4 年

KNN is the magic wand for e-commerce. A well written article Dr. Sheetal Sippy!

ASHOK SHAHANI

ADVOCATE and NOTARY at A P SHAHANI

4 年

Well researched article!? No wonder there is increase in impulse buying. The present aspect has also to be co related to other techniques which impress on peospective buyers to attain finality. @ Dr.? Sheetal Sippy. You have taken great pains to post these words.? I wish you many more articles like these from your side.? Best wishes..?

查看更多评论

要查看或添加评论，请登录

Dr. Sheetal Sippy的更多文章

Medical Coding: Life made easy

2020年10月21日

Medical Coding: Life made easy

With the penetration of technology in the healthcare systems, you may have come across words like, “Medical Coding and…

6 条评论

K-Nearest Neighbors Algorithm: Technical application of the phrase, "Birds of a feather flock together"

Dr. Sheetal Sippy

Program Strategy-Life Sciences and Health Care, Deloitte India (Offices of the US)

Dr. Sheetal Sippy的更多文章

社区洞察

其他会员也浏览了

Probabilistic Nearest Neighbors: The Swiss Army Knife of GenAI

OpenAI's Cognitive Cage: Deep Implications of a Shift from "Clear Path" to "Slow Climb"

??Top ML Papers of the Week

Artificial Intelligence #119

Ensemble Methods in Practice: Combining the Strengths of Multiple Models and Making Decisions

Artificial Intelligence #121

Artificial Intelligence #130

Artificial Intelligence #59

Machine Learning An Old Wine (OR) in a new Bottle

AI, Artificial General Intelligence, and Intuition

Dr. Sheetal Sippy的更多文章

Medical Coding: Life made easy

社区洞察

其他会员也浏览了

Probabilistic Nearest Neighbors: The Swiss Army Knife of GenAI

OpenAI's Cognitive Cage: Deep Implications of a Shift from "Clear Path" to "Slow Climb"

??Top ML Papers of the Week

Artificial Intelligence #119

Ensemble Methods in Practice: Combining the Strengths of Multiple Models and Making Decisions

Artificial Intelligence #121

Artificial Intelligence #130

Artificial Intelligence #59

Machine Learning An Old Wine (OR) in a new Bottle

AI, Artificial General Intelligence, and Intuition