登录查看更多内容

KNN Algorithm

Kiruthika Subramani

Innovating AI for a Better Tomorrow | AI Engineer | Google Developer Expert | Author | IBM Dual Champion | 200+ Global AI Talks | Master's Student at MILA

发布日期: 2023年5月14日

?? Hello all!

Welcome to the sixth week of our "Cup of coffee with an Algorithm in ML" series! ??.And this week, we're excited to dive into the K-Nearest Neighbors (KNN) algorithm! ??

KNN is a type of machine learning algorithm that can be used for classification or regression tasks.

The KNN algorithm assumes that similar things exist in close proximity. In other words, similar things are near to each other. In simple words, if two objects share similar features, they are likely to be neighbors in the dataset.

What is K?

"k" is a user-defined hyperparameter that determines how many neighbors to consider.

Imagine you have a dataset of roses with their petal lengths and widths, as well as their colors (red, pink, or white). You want to use KNN to predict the color of a new rose based on its petal length and width.

To use KNN, you need to choose a value for k, which represents the number of nearest neighbors to consider. For example, let's say you choose k=5.

Now, suppose you have a new rose with a petal length of 4.8 cm and a petal width of 1.5 cm. To predict its color using KNN, you need to find the five nearest neighbors to this new rose based on their petal lengths and widths.

You can calculate the distance between the new rose and all the roses in your dataset using a distance metric, such as Euclidean distance. The five roses that are closest to the new rose (i.e., have the shortest distance) are considered its nearest neighbors.

Let's say the five nearest neighbors are:

Rose A with petal length of 4.5 cm, petal width of 1.6 cm, and color of pink
Rose B with petal length of 5.2 cm, petal width of 1.8 cm, and color of white
Rose C with petal length of 4.9 cm, petal width of 1.7 cm, and color of pink
Rose D with petal length of 4.7 cm, petal width of 1.4 cm, and color of red
Rose E with petal length of 4.8 cm, petal width of 1.3 cm, and color of red

Since three of the five nearest neighbors are pink and two are red, we would predict that the new rose is likely to be pink.

Still not clear ?

In KNN, we believe similar things are close to each other.
If you want to determine which class a new data point belongs to (e.g., red or black), the first step in the K-Nearest Neighbors (KNN) algorithm is to choose the value of k, which represents the number of neighbors to consider. For example, if you choose k=3, the algorithm will select the three nearest neighbors by calculating the distance between the new data point and all other data points in the dataset. The three closest data points will be selected as neighbors and used to make a prediction about the class of the new data point.
the KNN algorithm starts the voting process to determine the class of the new data point. In your example where k=3 and two of the three neighbors are black, the algorithm would classify the new data point as black since it is the majority class among the three nearest neighbors.

领英推荐

Dorothy, You're Not in Kaggle Anymore

Peter Cotton 4 年前

Reasoning on Graphs – Part II – Comparison and mapping…

Fabio Ricci 2 年前

Data Science #22

Andriy Burkov 11 个月前

How to calculate the distance in KNN?

The most common distance metric used in KNN is Euclidean distance.For example, suppose you have two data points A and B, where A has features (x1, y1) and B has features (x2, y2). The Euclidean distance between A and B can be calculated using the following formula:

How to choose right value of K in KNN?

Choosing the right value of k in K-Nearest Neighbors (KNN) is an important step in the algorithm, as it can significantly affect the accuracy of the model. Here are a few tips on how to choose the right value of k:

Consider the size of your dataset: If you have a small dataset, using a small value of k may be more appropriate. Conversely, if you have a large dataset, a larger value of k may work better.
Choose an odd value for k: When you have a binary classification problem (e.g., two classes), choosing an odd value for k will prevent ties in the voting process. For example, if you choose k=2 and the two nearest neighbors have different classes, there is no clear majority, resulting in a tie.
Use cross-validation: You can use cross-validation techniques such as k-fold cross-validation to evaluate the performance of the model with different values of k. This can help you determine which value of k provides the best accuracy.
Experiment with different values of k: It's always a good idea to experiment with different values of k and evaluate the performance of the model with each value. You can start with a small value of k and gradually increase it to see how the performance changes.

Do you Know KNN is a Lazy Learner?

KNN is a type of lazy learning algorithm because it doesn't build a model from the data, but instead stores the entire dataset and calculates predictions based on nearest neighbors.

Let us start the implementation

# Step 1: Create a Data frame
import pandas as pd
data = pd.DataFrame({
? ? 'Age': [22, 25, 35, 28, 45, 33],
? ? 'Income': [30000, 40000, 50000, 60000, 80000, 70000],
? ? 'Education Level': ['High School', 'Some College', 'College', 'Graduate School', 'Graduate School', 'College'],
? ? 'Buys New Car': ['No', 'Yes', 'Yes', 'Yes', 'No', 'Yes']
})


# Step 2: Split the data
from sklearn.model_selection import train_test_split
X = pd.get_dummies(data.drop('Buys New Car', axis=1))
y = data['Buys New Car']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


# Step 3: Preprocess the data
# No preprocessing needed in this example

# Step 4: Train the KNN algorithm
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)


# Step 5: Test the KNN algorithm
y_pred = knn.predict(X_test)
from sklearn.metrics import accuracy_score
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)


# Step 6: Make predictions on new data
import numpy as np
new_data = pd.DataFrame({
? ? 'Age': [30, 40],
? ? 'Income': [45000, 60000],
? ? 'Education Level_College': [1, 0],
? ? 'Education Level_Graduate School': [0, 1],
? ? 'Education Level_High School': [0, 0],
? ? 'Education Level_Some College': [0, 0]
})
new_data_pred = knn.predict(new_data)
print("New data predictions:", new_data_pred)

Eager to now, step by step explanation - Colab

?? Finally, we did it! ?? Our weekly "Cup of Coffee with ML Algorithm" sixth series has come to a close.?? But don't worry, I'll be back next week with more exciting algorithm to explore. So grab a cup of coffee and join us for another week:)

Stay tuned for updates on our next topic. See you soon! ??

Cheers,

Kiruthika

Mohan Mahesh Boggavarapu

Chair Person @MLSC KARE||Community Lead @Code Vipassana|| Facilitators'24||Actively Seeking for SWE and Data to AI opportunities

1 年

Well explained akka KIRUTHIKA S

1 次回应

要查看或添加评论，请登录

Kiruthika Subramani的更多文章

RAG System with Video

2024年9月13日

RAG System with Video

Hello Everyone,It’s Friday, and guess who’s back? Hope you all had a fantastic week! This week, let’s dive into…

2 条评论
Building a RAG System using Gemini API

2024年9月6日

Building a RAG System using Gemini API

Welcome to the first episode of AI Weekly with Krithi! In this series, we’ll explore various AI topics, tools, and…

3 条评论
Evaluation methods for LLMs

2024年5月22日

Evaluation methods for LLMs

Hey all, Welcome back for the sixth Episode of Cup of Coffee Series with LLMs. Again we have Mr.
Different Fine-tuning Methods for LLMs

2024年5月10日

Different Fine-tuning Methods for LLMs

Hey all, Welcome back for the fifth Episode of Cup of Coffee Series with LLMs. Again we have Mr.

1 条评论
Pretraining and Fine Tuning LLMs

2024年5月5日

Pretraining and Fine Tuning LLMs

Hey all, Welcome back for the fourth Episode of Cup of Coffee Series with LLMs. Again we have Mr.
Architecting Large Language Models

2024年5月2日

Architecting Large Language Models

Hey all, Welcome back for the third Episode of Cup of Coffee Series with LLMs. Again we have Mr.
LLMs #2

2024年4月29日

LLMs #2

Hey all, Welcome back for the second Episode of Cup of Coffee Series with LLMs. Again we have Mr.

2 条评论
LLM's Introduction

2024年4月26日

LLM's Introduction

Hello Everyone! Kiruthika here, after a long. I am back with the cup of coffee series with LLMs.

2 条评论
Transformers

2023年12月25日

Transformers

Hello, folks! Kiruthika is back after a long break. Yep, let's get started with our Cup of Coffee Series! Today, we…

4 条评论
Generative Adversarial Network (GAN)

2023年10月24日

Generative Adversarial Network (GAN)

??????Pour yourself a virtual cup of coffee with GANs after a long. Finally, we are stepping into 19 th week of this…

1 条评论

See all articles

KNN Algorithm

Kiruthika Subramani

Innovating AI for a Better Tomorrow | AI Engineer | Google Developer Expert | Author | IBM Dual Champion | 200+ Global AI Talks | Master's Student at MILA

What is K?

领英推荐

How to calculate the distance in KNN?

How to choose right value of K in KNN?

Do you Know KNN is a Lazy Learner?

Let us start the implementation

Kiruthika Subramani的更多文章

社区洞察

其他会员也浏览了

Data Science #22

Knowledge As Elementary Information Node Graph A.k.a. the EINGRAPH

Effective XGBoost by Matt Harrison

Understanding Poisson Distribution in Data Science

The Union Find Algorithm

Demystifying the K-Nearest Neighbors (KNN) Algorithm: A Deep Dive into Its Mechanics and Applications

Understanding Random Forest. How the Algorithm Works and Why it Is…

Extended comparison of Chronos against the statistical ensemble

Machine Learning Unveils House Price Predictions!

SORTING ALGORITHMS: QUICK SORT

What is K?

领英推荐

How to calculate the distance in KNN?

How to choose right value of K in KNN?

Do you Know KNN is a Lazy Learner?

Let us start the implementation

Kiruthika Subramani的更多文章

RAG System with Video

Building a RAG System using Gemini API

Evaluation methods for LLMs

Different Fine-tuning Methods for LLMs

Pretraining and Fine Tuning LLMs

Architecting Large Language Models

LLMs #2

LLM's Introduction

Transformers

Generative Adversarial Network (GAN)

社区洞察

其他会员也浏览了

Data Science #22

Knowledge As Elementary Information Node Graph A.k.a. the EINGRAPH

Effective XGBoost by Matt Harrison

Understanding Poisson Distribution in Data Science

The Union Find Algorithm

Demystifying the K-Nearest Neighbors (KNN) Algorithm: A Deep Dive into Its Mechanics and Applications

Understanding Random Forest. How the Algorithm Works and Why it Is…

Extended comparison of Chronos against the statistical ensemble

Machine Learning Unveils House Price Predictions!

SORTING ALGORITHMS: QUICK SORT