KNN Algorithm

KNN Algorithm

?? Hello all!

Welcome to the sixth week of our "Cup of coffee with an Algorithm in ML" series! ??.And this week, we're excited to dive into the K-Nearest Neighbors (KNN) algorithm! ??

No alt text provided for this image

KNN is a type of machine learning algorithm that can be used for classification or regression tasks.

The KNN algorithm assumes that similar things exist in close proximity. In other words, similar things are near to each other. In simple words, if two objects share similar features, they are likely to be neighbors in the dataset.

What is K?

"k" is a user-defined hyperparameter that determines how many neighbors to consider.

No alt text provided for this image


Imagine you have a dataset of roses with their petal lengths and widths, as well as their colors (red, pink, or white). You want to use KNN to predict the color of a new rose based on its petal length and width.

To use KNN, you need to choose a value for k, which represents the number of nearest neighbors to consider. For example, let's say you choose k=5.

Now, suppose you have a new rose with a petal length of 4.8 cm and a petal width of 1.5 cm. To predict its color using KNN, you need to find the five nearest neighbors to this new rose based on their petal lengths and widths.

You can calculate the distance between the new rose and all the roses in your dataset using a distance metric, such as Euclidean distance. The five roses that are closest to the new rose (i.e., have the shortest distance) are considered its nearest neighbors.

Let's say the five nearest neighbors are:

  • Rose A with petal length of 4.5 cm, petal width of 1.6 cm, and color of pink
  • Rose B with petal length of 5.2 cm, petal width of 1.8 cm, and color of white
  • Rose C with petal length of 4.9 cm, petal width of 1.7 cm, and color of pink
  • Rose D with petal length of 4.7 cm, petal width of 1.4 cm, and color of red
  • Rose E with petal length of 4.8 cm, petal width of 1.3 cm, and color of red

Since three of the five nearest neighbors are pink and two are red, we would predict that the new rose is likely to be pink.

No alt text provided for this image

Still not clear ?

  • In KNN, we believe similar things are close to each other.
  • If you want to determine which class a new data point belongs to (e.g., red or black), the first step in the K-Nearest Neighbors (KNN) algorithm is to choose the value of k, which represents the number of neighbors to consider. For example, if you choose k=3, the algorithm will select the three nearest neighbors by calculating the distance between the new data point and all other data points in the dataset. The three closest data points will be selected as neighbors and used to make a prediction about the class of the new data point.
  • the KNN algorithm starts the voting process to determine the class of the new data point. In your example where k=3 and two of the three neighbors are black, the algorithm would classify the new data point as black since it is the majority class among the three nearest neighbors.

How to calculate the distance in KNN?

The most common distance metric used in KNN is Euclidean distance.For example, suppose you have two data points A and B, where A has features (x1, y1) and B has features (x2, y2). The Euclidean distance between A and B can be calculated using the following formula:

No alt text provided for this image

How to choose right value of K in KNN?

No alt text provided for this image


Choosing the right value of k in K-Nearest Neighbors (KNN) is an important step in the algorithm, as it can significantly affect the accuracy of the model. Here are a few tips on how to choose the right value of k:

  1. Consider the size of your dataset: If you have a small dataset, using a small value of k may be more appropriate. Conversely, if you have a large dataset, a larger value of k may work better.
  2. Choose an odd value for k: When you have a binary classification problem (e.g., two classes), choosing an odd value for k will prevent ties in the voting process. For example, if you choose k=2 and the two nearest neighbors have different classes, there is no clear majority, resulting in a tie.
  3. Use cross-validation: You can use cross-validation techniques such as k-fold cross-validation to evaluate the performance of the model with different values of k. This can help you determine which value of k provides the best accuracy.
  4. Experiment with different values of k: It's always a good idea to experiment with different values of k and evaluate the performance of the model with each value. You can start with a small value of k and gradually increase it to see how the performance changes.

Do you Know KNN is a Lazy Learner?

No alt text provided for this image

KNN is a type of lazy learning algorithm because it doesn't build a model from the data, but instead stores the entire dataset and calculates predictions based on nearest neighbors.

Let us start the implementation

# Step 1: Create a Data frame
import pandas as pd
data = pd.DataFrame({
? ? 'Age': [22, 25, 35, 28, 45, 33],
? ? 'Income': [30000, 40000, 50000, 60000, 80000, 70000],
? ? 'Education Level': ['High School', 'Some College', 'College', 'Graduate School', 'Graduate School', 'College'],
? ? 'Buys New Car': ['No', 'Yes', 'Yes', 'Yes', 'No', 'Yes']
})


# Step 2: Split the data
from sklearn.model_selection import train_test_split
X = pd.get_dummies(data.drop('Buys New Car', axis=1))
y = data['Buys New Car']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


# Step 3: Preprocess the data
# No preprocessing needed in this example

# Step 4: Train the KNN algorithm
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)


# Step 5: Test the KNN algorithm
y_pred = knn.predict(X_test)
from sklearn.metrics import accuracy_score
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)


# Step 6: Make predictions on new data
import numpy as np
new_data = pd.DataFrame({
? ? 'Age': [30, 40],
? ? 'Income': [45000, 60000],
? ? 'Education Level_College': [1, 0],
? ? 'Education Level_Graduate School': [0, 1],
? ? 'Education Level_High School': [0, 0],
? ? 'Education Level_Some College': [0, 0]
})
new_data_pred = knn.predict(new_data)
print("New data predictions:", new_data_pred)



        

Eager to now, step by step explanation - Colab

No alt text provided for this image

?? Finally, we did it! ?? Our weekly "Cup of Coffee with ML Algorithm" sixth series has come to a close.?? But don't worry, I'll be back next week with more exciting algorithm to explore. So grab a cup of coffee and join us for another week:)

Stay tuned for updates on our next topic. See you soon! ??

Cheers,

Kiruthika

Mohan Mahesh Boggavarapu

Chair Person @MLSC KARE||Community Lead @Code Vipassana|| Facilitators'24||Actively Seeking for SWE and Data to AI opportunities

1 年

Well explained akka KIRUTHIKA S

要查看或添加评论,请登录

Kiruthika Subramani的更多文章

  • RAG System with Video

    RAG System with Video

    Hello Everyone,It’s Friday, and guess who’s back? Hope you all had a fantastic week! This week, let’s dive into…

    2 条评论
  • Building a RAG System using Gemini API

    Building a RAG System using Gemini API

    Welcome to the first episode of AI Weekly with Krithi! In this series, we’ll explore various AI topics, tools, and…

    3 条评论
  • Evaluation methods for LLMs

    Evaluation methods for LLMs

    Hey all, Welcome back for the sixth Episode of Cup of Coffee Series with LLMs. Again we have Mr.

  • Different Fine-tuning Methods for LLMs

    Different Fine-tuning Methods for LLMs

    Hey all, Welcome back for the fifth Episode of Cup of Coffee Series with LLMs. Again we have Mr.

    1 条评论
  • Pretraining and Fine Tuning LLMs

    Pretraining and Fine Tuning LLMs

    Hey all, Welcome back for the fourth Episode of Cup of Coffee Series with LLMs. Again we have Mr.

  • Architecting Large Language Models

    Architecting Large Language Models

    Hey all, Welcome back for the third Episode of Cup of Coffee Series with LLMs. Again we have Mr.

  • LLMs #2

    LLMs #2

    Hey all, Welcome back for the second Episode of Cup of Coffee Series with LLMs. Again we have Mr.

    2 条评论
  • LLM's Introduction

    LLM's Introduction

    Hello Everyone! Kiruthika here, after a long. I am back with the cup of coffee series with LLMs.

    2 条评论
  • Transformers

    Transformers

    Hello, folks! Kiruthika is back after a long break. Yep, let's get started with our Cup of Coffee Series! Today, we…

    4 条评论
  • Generative Adversarial Network (GAN)

    Generative Adversarial Network (GAN)

    ??????Pour yourself a virtual cup of coffee with GANs after a long. Finally, we are stepping into 19 th week of this…

    1 条评论

社区洞察

其他会员也浏览了