登录查看更多内容

AI_Part_5_K-NN

ARNAB MUKHERJEE ????

Automation Specialist (Python & Analytics) at Capgemini ??|| Master's in Data Science || PGDM (Product Management) || Six Sigma Yellow Belt Certified || Certified Google Professional Workspace Administrator

发布日期: 2023年3月7日

+ 关注

K-NN Stands for K-Nearest Neighbour.

Let us imagine we have a scenario where we have two categories already present in our dataset.

One is Category A (Green scatter points), and another is Category B (Yellow scatter points).

We take two columns in our dataset x1 and x2. Now we add a new data point to our dataset. The question is, should it fall in the green category or in the yellow category?

This is where we take the help of K-NN.

Few points about K-NN

In the K-NN algorithm, we need to specify the number of neighbors.
K-NN is not a linear classifier.
The k-NN prediction boundary does not look like a smooth curve.
In Python, the class used to create a K-NN classifier is KNeighborsClassifier
Default parameter for the number of neighbors k = 5.

Step-by-step rule guide to K-NN:

Step 1: Choose the number K of neighbors. So we need to identify that k is equal to 1, 2, 3, 5, or some other number. One of the most common default values of k is 5.

Step 2: Take the nearest neighbors of the new data point, according to the Euclidean distance.

Note: We can also use Manhattan distance in place of Euclidean distance.

Euclidean Distance: https://byjus.com/maths/euclidean-distance/

Manhattan Distance: https://www.geeksforgeeks.org/maximum-manhattan-distance-between-a-distinct-pair-from-n-coordinates/

Step 3: Among the K neighbors, count the number of data points in each category.

Note: If we have more than two categories in our dataset, we need to calculate how many fall into each category.

Step 4: Assign the new data point to the category where you counted the most neighbors. That is where it is called K-nearest Neighbours.

Step 5: The model is ready

#K-Nearest Neighbors (K-NN)

#Importing the libraries

领英推荐

Mastering XGBoost: From Basics to Advanced Techniques…

Nick Gupta 1 年前

All Data and AI Weekly #172 - 13-Jan-2025

Tim Spann 2 个月前

Geek out time: Creating an Agent with Customized Skill…

Nedved Yang 8 个月前

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import confusion_matrix, accuracy_score
from matplotlib.colors import ListedColormap
from matplotlib.colors import ListedColormap

#Importing the dataset

dataset = pd.read_csv('ENTER_THE_NAME_OF_YOUR_DATASET_HERE.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values

#Splitting the dataset into the Training set and Test set

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)
print(X_train)
print(y_train)
print(X_test)
print(y_test)

#Feature Scaling

sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
print(X_train)
print(X_test)

#Training the K-NN model on the Training set

classifier = KNeighborsClassifier(n_neighbors = 5, metric = 'minkowski', p = 2)
classifier.fit(X_train, y_train)

#Predicting a new result

print(classifier.predict(sc.transform([[30,87000]])))

#Predicting the Test set results

y_pred = classifier.predict(X_test)
print(np.concatenate((y_pred.reshape(len(y_pred),1), y_test.reshape(len(y_test),1)),1))

#Making the Confusion Matrix

cm = confusion_matrix(y_test, y_pred)
print(cm)
accuracy_score(y_test, y_pred)

#Visualising the Training set results

X_set, y_set = sc.inverse_transform(X_train), y_train
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 10, stop = X_set[:, 0].max() + 10, step = 1),
                     np.arange(start = X_set[:, 1].min() - 1000, stop = X_set[:, 1].max() + 1000, step = 1))
plt.contourf(X1, X2, classifier.predict(sc.transform(np.array([X1.ravel(), X2.ravel()]).T)).reshape(X1.shape),
             alpha = 0.75, cmap = ListedColormap(('green', 'yellow')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
    plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1], c = ListedColormap(('green', 'yellow'))(i), label = j)
plt.title('K-NN (Training set)')
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()

#Visualising the Test set results

X_set, y_set = sc.inverse_transform(X_test), y_test
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 10, stop = X_set[:, 0].max() + 10, step = 1),
                     np.arange(start = X_set[:, 1].min() - 1000, stop = X_set[:, 1].max() + 1000, step = 1))
plt.contourf(X1, X2, classifier.predict(sc.transform(np.array([X1.ravel(), X2.ravel()]).T)).reshape(X1.shape),
             alpha = 0.75, cmap = ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
    plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1], c = ListedColormap(('red', 'green'))(i), label = j)
plt.title('K-NN (Test set)')
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()

AI and Beyond

2,825 位关注者

要查看或添加评论，请登录

ARNAB MUKHERJEE ????的更多文章

Agentic AI: The Next Big Breakthrough That's Transforming Business And Technology

2025年3月19日

Agentic AI: The Next Big Breakthrough That's Transforming Business And Technology

What Is Agentic AI? At its core, agentic AI refers to artificial intelligence systems that possess a degree of autonomy…
The Illustrated Children’s Guide to Kubernetes

2025年3月17日

The Illustrated Children’s Guide to Kubernetes

Dedicated to all the parents who try to explain software engineering to their children. Once upon a time there was an…

1 条评论
The Silent Resignation Phenomenon: Is Employee Engagement in IT at Risk?

2025年3月14日

The Silent Resignation Phenomenon: Is Employee Engagement in IT at Risk?

Understanding Silent Resignation Silent resignation doesn’t involve formal resignations or job changes. Instead, it…
Technologies that will 100% be labelled in some places as AI Agents or Agentic in some places.

2025年3月13日

Technologies that will 100% be labelled in some places as AI Agents or Agentic in some places.

01. Simple Reflex AI Agent “Simple Reflex AI Agent” they’re real, and they are ideal for people who make rules engines…
Understanding the Basics of Generative AI

2025年3月1日

Understanding the Basics of Generative AI

Generative Models Generative models are at the core of AI’s ability to create content. These models include Generative…

2 条评论
DeepSeek fever fuels patriotic bets on Chinese AI stocks

2025年2月19日

DeepSeek fever fuels patriotic bets on Chinese AI stocks

Chinese investors are rushing into AI-related stocks, betting the artificial intelligence advance of home-grown startup…

2 条评论
Death Is Not the End: How the Bhagavad Gita Explains Life After Death

2025年2月16日

Death Is Not the End: How the Bhagavad Gita Explains Life After Death

1. What We Are, Beyond Our Bodies We are eternal souls, not just physical bodies.
Lucknow's Ascent on Cryptocurrency Investments

2025年2月9日

Lucknow's Ascent on Cryptocurrency Investments

According to CoinDCX's latest report, Lucknow now ranks eighth among Indian cities in terms of cryptocurrency…
What India Needs for a Seamless EV Ecosystem: Challenges and Opportunities

2025年2月7日

What India Needs for a Seamless EV Ecosystem: Challenges and Opportunities

India is on a mission to transition towards a sustainable and eco-friendly mobility ecosystem. With ambitious goals of…
The Shrinking Demand for Data Annotation Jobs

2025年2月5日

The Shrinking Demand for Data Annotation Jobs

1. Advancements in AI-powered auto-labeling Companies have heavily invested in self-supervised learning and synthetic…

See all articles

AI_Part_5_K-NN

ARNAB MUKHERJEE ????

Automation Specialist (Python & Analytics) at Capgemini ??|| Master's in Data Science || PGDM (Product Management) || Six Sigma Yellow Belt Certified || Certified Google Professional Workspace Administrator

K-NN Stands for K-Nearest Neighbour.

Step-by-step rule guide to K-NN:

领英推荐

AI and Beyond

2,825 位关注者

ARNAB MUKHERJEE ????的更多文章

社区洞察

其他会员也浏览了

?? How Autoformer Tackles Time Series Challenges in Python ??

The T-test!

Decision Trees: A Guide to Understanding and Building

A Complete Guide to Principal Component Analysis — PCA in Machine Learning

A carefully curated list of multiple cheatsheets

ML Classification Algorithms to Predict Market Movements and Backtesting

k-Nearest Neighbors in Machine Learning (k-NN)

Least Cost Path Analysis with A* Algorithm

How to Estimate Chance with Dice Rolls Using Convolutions and Recursion

Exploring foundational machine learning algorithms: Linear regression, decision trees, and K-nearest neighbors

K-NN Stands for K-Nearest Neighbour.

Step-by-step rule guide to K-NN:

领英推荐

AI and Beyond

2,825 位关注者

ARNAB MUKHERJEE ????的更多文章

Agentic AI: The Next Big Breakthrough That's Transforming Business And Technology

The Illustrated Children’s Guide to Kubernetes

The Silent Resignation Phenomenon: Is Employee Engagement in IT at Risk?

Technologies that will 100% be labelled in some places as AI Agents or Agentic in some places.

Understanding the Basics of Generative AI

DeepSeek fever fuels patriotic bets on Chinese AI stocks

Death Is Not the End: How the Bhagavad Gita Explains Life After Death

Lucknow's Ascent on Cryptocurrency Investments

What India Needs for a Seamless EV Ecosystem: Challenges and Opportunities

The Shrinking Demand for Data Annotation Jobs

社区洞察

其他会员也浏览了

?? How Autoformer Tackles Time Series Challenges in Python ??

The T-test!

Decision Trees: A Guide to Understanding and Building

A Complete Guide to Principal Component Analysis — PCA in Machine Learning

A carefully curated list of multiple cheatsheets

ML Classification Algorithms to Predict Market Movements and Backtesting

k-Nearest Neighbors in Machine Learning (k-NN)

Least Cost Path Analysis with A* Algorithm

How to Estimate Chance with Dice Rolls Using Convolutions and Recursion

Exploring foundational machine learning algorithms: Linear regression, decision trees, and K-nearest neighbors