k-nearest neighbors algorithm
Image link : https://www.irishtimes.com/polopoly_fs/1.3336205.1513945656!/image/image.jpg_gen/derivatives/box_620_330/image.jpg

k-nearest neighbors algorithm


No alt text provided for this image

In this article we will learn about k-nearest neighbors algorithm. Here is the contents mentioned below

1.What is k-nearest neighbors algorithm?

2.Why we should use this algorithm?

3.How to use this algorithm?

1.What is k-nearest neighbors algorithm?

The KNN Algorithm assumes that similar things exist in close propinquity or similar things are near to other.

In pattern recognition, the k-nearest neighbors algorithm (k-NN) is a non-parametric method used for classification and regression.[1] In both cases, the input consists of the k closest training examples in the feature space. The output depends on whether k-NN is used for classification or regression:

  • In k-NN classification, the output is a class membership. An object is classified by a plurality vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors (k is a positive integer, typically small). If k = 1, then the object is simply assigned to the class of that single nearest neighbor.
  • In k-NN regression, the output is the property value for the object. This value is the average of the values of k nearest neighbors.Wikipedia

2.Why we should use this algorithm?

*NN is pretty intuitive and simple

*K-NN has no assumptions

*It constantly evolves

*Very easy to implement for multi-class problem

*Can be used both for Classification and Regression

3.How to use this algorithm?

We will discuss three simple steps to use this algorithm with the help of example.

We will take example to find the pass or fail in the exam from different data of student from CS and Maths Subject

No alt text provided for this image


1.Calculate Euclidean Distance

2.Get NN

3.Make Prediction

1.Calculate Euclidean Distance

In mathematics, the Euclidean distance or Euclidean metric is the "ordinary" straight-line distance between two points in Euclidean space.wikipedia

Formula:

No alt text provided for this image
Photo is taken from TechSim+


We will take this data set

dataset = [
    [2.7810836,2.550537003,0],
    [1.465489372,2.362125076,0],
    [3.396561688,4.400293529,0],
    [1.38807019,1.850220317,0],
    [3.06407232,3.005305973,0],
    [7.627531214,2.759262235,1],
    [5.332441248,2.088626775,1],
    [6.922596716,1.77106367,1],
    [8.675418651,-0.242068655,1],
    [7.673756466,3.508563011,1]
    
      ]


Function define for calculation of Euclidean Distance

def Euclidean_distance(row1, row2):
    distance = 0
    for i in range(len(row1)-1):
        distance += (row1[i] - row2[i])**2
    return sqrt(distance)
    


test = [8.675418651, 2.088626775,1]
for i in dataset:
    dis = Euclidean_distance(test, i)
    print(dis)
    

Output

5.912406172801237
7.215114796649554
5.762823441453197
7.291247165694985
5.68572848441368
1.244114143007722
3.342977402999999
1.7813565228427415
2.33069543
1.7376841045382272

2.Get NN(Nearest Neighbor)

In ths step we will train the model to find K nearest neighbors

Function

def Get_Neighbors(train, test_row, num):
    """
    1. train data you have 5 Data Points.
    2. in test_row you have only 1 point
    3. num we have number of Neighbors
                a. We will get 5 Diff Dist.
                b . sort our data according to near dist.
                c. We will collect num points.
    """
    
    
    distance = list() # []
    data = []
    for i in train:
        dist = Euclidean_distance(test_row, i)
        distance.append(dist)
        data.append(i)
    distance = np.array(distance)
    data = np.array(data)
    """ we are finding index of min distance """
    index_dist = distance.argsort()
    """ we arange our data acco. to index """
    data  ?= data[index_dist]
    """ we are slicing num number of datas """
    neighbors = data[:num]
    
    return neighbors

Now we will call the function

Get_Neighbors(dataset, test, 4)


output

array([[ 7.62753121,  2.75926224,  1.        ],
       [ 7.67375647,  3.50856301,  1.        ],
       [ 6.92259672,  1.77106367,  1.        ],
       
       [ 8.67541865, -0.24206865,  1.        ]])

3.Make Prediction

In this step we will define another function to predict values

def predict_classification(train, test_row, num):
    Neighbors = Get_Neighbors(train, test_row, num)
    Classes = []
    for i in Neighbors:
        Classes.append(i[-1])
    prediction = max(Classes, key= Classes.count)
    
    return prediction

We will call the function

predict_classification(dataset, test, 6)

output

1.0

Verifying the result

prediction = predict_classification(dataset, test, 4)
print("We expected {}, Got {}".format(test[-1], prediction))

output:

We expected 1, Got 1.0


Sources:

要查看或添加评论,请登录

Kaushik das的更多文章

  • Cryptocurrency Market Data Analysis

    Cryptocurrency Market Data Analysis

    I have done this project during my internship on TechSimplus with our great mentor Prateek Mishra Sir. In this analysis…

社区洞察

其他会员也浏览了