KNN Algorithm
Kiruthika Subramani
Innovating AI for a Better Tomorrow | AI Engineer | Google Developer Expert | Author | IBM Dual Champion | 200+ Global AI Talks | Master's Student at MILA
?? Hello all!
Welcome to the sixth week of our "Cup of coffee with an Algorithm in ML" series! ??.And this week, we're excited to dive into the K-Nearest Neighbors (KNN) algorithm! ??
KNN is a type of machine learning algorithm that can be used for classification or regression tasks.
The KNN algorithm assumes that similar things exist in close proximity. In other words, similar things are near to each other. In simple words, if two objects share similar features, they are likely to be neighbors in the dataset.
What is K?
"k" is a user-defined hyperparameter that determines how many neighbors to consider.
Imagine you have a dataset of roses with their petal lengths and widths, as well as their colors (red, pink, or white). You want to use KNN to predict the color of a new rose based on its petal length and width.
To use KNN, you need to choose a value for k, which represents the number of nearest neighbors to consider. For example, let's say you choose k=5.
Now, suppose you have a new rose with a petal length of 4.8 cm and a petal width of 1.5 cm. To predict its color using KNN, you need to find the five nearest neighbors to this new rose based on their petal lengths and widths.
You can calculate the distance between the new rose and all the roses in your dataset using a distance metric, such as Euclidean distance. The five roses that are closest to the new rose (i.e., have the shortest distance) are considered its nearest neighbors.
Let's say the five nearest neighbors are:
Since three of the five nearest neighbors are pink and two are red, we would predict that the new rose is likely to be pink.
Still not clear ?
领英推荐
How to calculate the distance in KNN?
The most common distance metric used in KNN is Euclidean distance.For example, suppose you have two data points A and B, where A has features (x1, y1) and B has features (x2, y2). The Euclidean distance between A and B can be calculated using the following formula:
How to choose right value of K in KNN?
Choosing the right value of k in K-Nearest Neighbors (KNN) is an important step in the algorithm, as it can significantly affect the accuracy of the model. Here are a few tips on how to choose the right value of k:
Do you Know KNN is a Lazy Learner?
KNN is a type of lazy learning algorithm because it doesn't build a model from the data, but instead stores the entire dataset and calculates predictions based on nearest neighbors.
Let us start the implementation
# Step 1: Create a Data frame
import pandas as pd
data = pd.DataFrame({
? ? 'Age': [22, 25, 35, 28, 45, 33],
? ? 'Income': [30000, 40000, 50000, 60000, 80000, 70000],
? ? 'Education Level': ['High School', 'Some College', 'College', 'Graduate School', 'Graduate School', 'College'],
? ? 'Buys New Car': ['No', 'Yes', 'Yes', 'Yes', 'No', 'Yes']
})
# Step 2: Split the data
from sklearn.model_selection import train_test_split
X = pd.get_dummies(data.drop('Buys New Car', axis=1))
y = data['Buys New Car']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Step 3: Preprocess the data
# No preprocessing needed in this example
# Step 4: Train the KNN algorithm
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)
# Step 5: Test the KNN algorithm
y_pred = knn.predict(X_test)
from sklearn.metrics import accuracy_score
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
# Step 6: Make predictions on new data
import numpy as np
new_data = pd.DataFrame({
? ? 'Age': [30, 40],
? ? 'Income': [45000, 60000],
? ? 'Education Level_College': [1, 0],
? ? 'Education Level_Graduate School': [0, 1],
? ? 'Education Level_High School': [0, 0],
? ? 'Education Level_Some College': [0, 0]
})
new_data_pred = knn.predict(new_data)
print("New data predictions:", new_data_pred)
Eager to now, step by step explanation - Colab
?? Finally, we did it! ?? Our weekly "Cup of Coffee with ML Algorithm" sixth series has come to a close.?? But don't worry, I'll be back next week with more exciting algorithm to explore. So grab a cup of coffee and join us for another week:)
Stay tuned for updates on our next topic. See you soon! ??
Cheers,
Kiruthika
Chair Person @MLSC KARE||Community Lead @Code Vipassana|| Facilitators'24||Actively Seeking for SWE and Data to AI opportunities
1 年Well explained akka KIRUTHIKA S