登录查看更多内容

KNN - K Nearest Neighbour

Ashutosh Chaudhary

Senior Manager Applications

发布日期: 2020年5月9日

+ 关注

Whats the need:

KNN comes under Supervised Learning Algorithms. Its used where there is NON-Linear Regression and hence Logistic Regression cannot be used. The need for this arises since for some cases (as shown in diagram) the classification cannot be achiveved by a single straight line.

How does the algorithm work:

KNN: K Nearest Neighbours, is one of the simplest Supervised Machine Learning Algorithm which is mainly used for classification. It classifies a data point based on how its neighbours are classified. It works on the 'Distance' principle.

KNN is based on Feature Similarity. The 'K' refers to the number of neighbours we want to include in the Majority of the process

In the diagram, if we choose K = 3, then the '?' will be labelled as Square since the in circle K=3, the squares are in majority. If we choose K=7, then '?' will be Triangle since Triangles are in Majority in K=7.

The distance between two points is calculated by Euclidean Distance for Continous variables. For Categorical variables, Hamming Distance is used.

To choose K (for a starting point):

Take the Square root of n where n is the total number of data point
Take an odd value of K

When choosing the value of K, keep in mind that if value of K is too small, neighborhood is sensitive to noise points and if the value of K is too large, neighborhood may include points from other classes

Feature Scaling

Feature Scaling is of prime importance to ensure that one featture doesn't overshadow the other feature. Any algorithm which considers distance, has to be scalled. KNN is no exception to this.

For example: If there are three varibales - Age (10-100 years), Weight (10-120kg), Salary (3,00,000 - 30,00,000 INR). In this case, more of the clusters will be generated based on the last feature i.e. Age. To avoid this miss-classification, we shoud normalize the feature variables. Any algorithm where distance plays a vital role for prediction or classification, we need to do Feature Scaling.

Algorithm in R:

For KNN, install the package "Class" in R. In R, we can train and test both in the same line.

Steps to be followed:

1. Split the data into Testa and Train

2. Feature Scale

3. Fit KNN to the training dataset and predict the test set

4. Model Evaluation - Choosing the Right K (Parameter Tuning)

Challenges with KNN

Scaling issue which can be overcome by doing feature scaling
Choosing the Right K (Mostly take an odd number for the value of K)

KNN - K Nearest Neighbour

Ashutosh Chaudhary

Senior Manager Applications

Whats the need:

How does the algorithm work:

Feature Scaling

Algorithm in R:

Challenges with KNN

更多精彩文章

社区洞察

其他会员也浏览了

4 algorithms machine learning engineers should know

Using Generative Adversarial networks (GANs) to augment data

Basic Machine Learning Algorithms

Extracting Graph Level Features from Graphs for Machine Learning Models: Part 4 of X of my notes

Unsupervised Learning as Signals for Pairs Trading and StatArb

Top Data Science and Machine Learning Methods Used

Building a Machine Learning Pipeline

Extracting Link Level Features from Graphs for Machine Learning Models: Part 3 of X of my notes

AI Model Evaluation in iDesktopX

AI Model Evaluation in iDesktopX

Whats the need:

How does the algorithm work:

Feature Scaling

Algorithm in R:

Challenges with KNN

Executive Briefing- Computer Vision

2021年3月5日

Executive Briefing - Artificial Intelligence

2021年3月5日

SVM (Support Vector Machine)

2020年5月24日

Logistic Regression

2020年5月21日

Linear Regression

2020年5月17日

社区洞察

其他会员也浏览了

4 algorithms machine learning engineers should know

Using Generative Adversarial networks (GANs) to augment data

Basic Machine Learning Algorithms

Extracting Graph Level Features from Graphs for Machine Learning Models: Part 4 of X of my notes

Unsupervised Learning as Signals for Pairs Trading and StatArb

Top Data Science and Machine Learning Methods Used

Building a Machine Learning Pipeline

Extracting Link Level Features from Graphs for Machine Learning Models: Part 3 of X of my notes

AI Model Evaluation in iDesktopX

AI Model Evaluation in iDesktopX