登录查看更多内容

k-nearest neighbors algorithm

Kaushik das

?? Frontend Developer | React.js, Redux, Tailwind CSS, Jest | Built Scalable Interfaces for Healthcare & Automotive (Toyota) | 30% Faster Load Times | Open to Remote & Onsite Roles

发布日期: 2020年5月17日

+ 关注

In this article we will learn about k-nearest neighbors algorithm. Here is the contents mentioned below

1.What is k-nearest neighbors algorithm?

2.Why we should use this algorithm?

3.How to use this algorithm?

1.What is k-nearest neighbors algorithm?

The KNN Algorithm assumes that similar things exist in close propinquity or similar things are near to other.

In pattern recognition, the k-nearest neighbors algorithm (k-NN) is a non-parametric method used for classification and regression.[1] In both cases, the input consists of the k closest training examples in the feature space. The output depends on whether k-NN is used for classification or regression:

In k-NN classification, the output is a class membership. An object is classified by a plurality vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors (k is a positive integer, typically small). If k = 1, then the object is simply assigned to the class of that single nearest neighbor.
In k-NN regression, the output is the property value for the object. This value is the average of the values of k nearest neighbors.Wikipedia

2.Why we should use this algorithm?

*NN is pretty intuitive and simple

*K-NN has no assumptions

*It constantly evolves

*Very easy to implement for multi-class problem

*Can be used both for Classification and Regression

3.How to use this algorithm?

We will discuss three simple steps to use this algorithm with the help of example.

We will take example to find the pass or fail in the exam from different data of student from CS and Maths Subject

1.Calculate Euclidean Distance

2.Get NN

3.Make Prediction

1.Calculate Euclidean Distance

In mathematics, the Euclidean distance or Euclidean metric is the "ordinary" straight-line distance between two points in Euclidean space.wikipedia

Formula:

We will take this data set

dataset = [
    [2.7810836,2.550537003,0],
    [1.465489372,2.362125076,0],
    [3.396561688,4.400293529,0],
    [1.38807019,1.850220317,0],
    [3.06407232,3.005305973,0],
    [7.627531214,2.759262235,1],
    [5.332441248,2.088626775,1],
    [6.922596716,1.77106367,1],
    [8.675418651,-0.242068655,1],
    [7.673756466,3.508563011,1]
    
      ]

Function define for calculation of Euclidean Distance

def Euclidean_distance(row1, row2):
    distance = 0
    for i in range(len(row1)-1):
        distance += (row1[i] - row2[i])**2
    return sqrt(distance)
    


test = [8.675418651, 2.088626775,1]
for i in dataset:
    dis = Euclidean_distance(test, i)
    print(dis)

Output

5.912406172801237
7.215114796649554
5.762823441453197
7.291247165694985
5.68572848441368
1.244114143007722
3.342977402999999
1.7813565228427415
2.33069543
1.7376841045382272

2.Get NN(Nearest Neighbor)

In ths step we will train the model to find K nearest neighbors

Function

def Get_Neighbors(train, test_row, num):
    """
    1. train data you have 5 Data Points.
    2. in test_row you have only 1 point
    3. num we have number of Neighbors
                a. We will get 5 Diff Dist.
                b . sort our data according to near dist.
                c. We will collect num points.
    """
    
    
    distance = list() # []
    data = []
    for i in train:
        dist = Euclidean_distance(test_row, i)
        distance.append(dist)
        data.append(i)
    distance = np.array(distance)
    data = np.array(data)
    """ we are finding index of min distance """
    index_dist = distance.argsort()
    """ we arange our data acco. to index """
    data  ?= data[index_dist]
    """ we are slicing num number of datas """
    neighbors = data[:num]
    
    return neighbors

Now we will call the function

Get_Neighbors(dataset, test, 4)

output

array([[ 7.62753121,  2.75926224,  1.        ],
       [ 7.67375647,  3.50856301,  1.        ],
       [ 6.92259672,  1.77106367,  1.        ],

       [ 8.67541865, -0.24206865,  1.        ]])

3.Make Prediction

In this step we will define another function to predict values

def predict_classification(train, test_row, num):
    Neighbors = Get_Neighbors(train, test_row, num)
    Classes = []
    for i in Neighbors:
        Classes.append(i[-1])
    prediction = max(Classes, key= Classes.count)

    
    return prediction

We will call the function

predict_classification(dataset, test, 6)

output

1.0

Verifying the result

prediction = predict_classification(dataset, test, 4)

print("We expected {}, Got {}".format(test[-1], prediction))

output:

We expected 1, Got 1.0

Sources:

Images are taken from::
https://www.techsimplus.com/
?shorturl.at/EQ129

Kaushik das的更多文章

Cryptocurrency Market Data Analysis

2020年6月24日

Cryptocurrency Market Data Analysis

I have done this project during my internship on TechSimplus with our great mentor Prateek Mishra Sir. In this analysis…

k-nearest neighbors algorithm

Kaushik das

?? Frontend Developer | React.js, Redux, Tailwind CSS, Jest | Built Scalable Interfaces for Healthcare & Automotive (Toyota) | 30% Faster Load Times | Open to Remote & Onsite Roles

1.What is k-nearest neighbors algorithm?

3.How to use this algorithm?

1.Calculate Euclidean Distance

2.Get NN(Nearest Neighbor)

3.Make Prediction

Verifying the result

Kaushik das的更多文章

社区洞察

其他会员也浏览了

RANDOM FOREST MODEL(RFM)

Algorithms — Big O Notation

Understanding Big O Notation, Time and Space Complexity

SVD — Single Value Decomposition

SHAP is not all you need (or why you should always use permutation feature importance)

No Free Lunch, Computer Vision - 1

STEP-BY-STEP-APPROACH-TO CLASSIFY-THE-PERSON-HAVING-CANCER-OR-NOT-USING-MLAI ALGORITHMS

Kalman Filter: The first dive

Day 06 — Support Vector Machine

Unified Convergence Analysis of Nonconvex Randomized Block Coordinate Descent Methods

1.What is k-nearest neighbors algorithm?

3.How to use this algorithm?

1.Calculate Euclidean Distance

2.Get NN(Nearest Neighbor)

3.Make Prediction

Verifying the result

Kaushik das的更多文章

Cryptocurrency Market Data Analysis

社区洞察

其他会员也浏览了

RANDOM FOREST MODEL(RFM)

Algorithms — Big O Notation

Understanding Big O Notation, Time and Space Complexity

SVD — Single Value Decomposition

SHAP is not all you need (or why you should always use permutation feature importance)

No Free Lunch, Computer Vision - 1

STEP-BY-STEP-APPROACH-TO CLASSIFY-THE-PERSON-HAVING-CANCER-OR-NOT-USING-MLAI ALGORITHMS

Kalman Filter: The first dive

Day 06 — Support Vector Machine

Unified Convergence Analysis of Nonconvex Randomized Block Coordinate Descent Methods