登录查看更多内容

K-Nearest Neighbors Explained: A Guide to Classification Algorithms

Hesam Alavi

AI developer, backend developer, database designer

发布日期: 2025年2月4日

K-Nearest Neighbors (KNN) is a simple yet powerful algorithm used for classification and regression tasks in machine learning. This article will explore the KNN algorithm, its implementation using the Iris dataset, and the underlying mathematics that make it effective.

What is KNN?

KNN is a non-parametric, instance-based learning algorithm that classifies data points based on their proximity to other data points. The core idea is straightforward: given a new data point, KNN looks at the ‘K’ nearest labeled data points and assigns the most common label among them.

Key Concepts

Distance Metric: KNN relies on distance calculations to determine the nearest neighbors. The most common distance metric is the Euclidean distance, defined as:

2. Choosing K: The value of K determines how many neighbors influence the classification. A small K can be sensitive to noise, while a large K can smooth out the decision boundary.

Implementing KNN with the Iris Dataset

The Iris dataset is a classic dataset used in machine learning, containing measurements of iris flowers from three different species. Let’s walk through the implementation of KNN using Python.

Step 1: Import Libraries

import matplotlib.pyplot as plt
from collections import Counter
import numpy as np

Step 2: Load the Dataset

We load the Iris dataset, which is assumed to be in CSV format.

领英推荐

#Artificial Intelligence # 72: A roadmap for time…

Ajit Jaokar 2 年前

The Nixtlar library, Gaussian Processes with PyMC…

Rami Krispin 3 个月前

New Interpolation Methods for Data Synthetization and…

Vincent Granville 2 年前

main_df = np.loadtxt("datasets/iris.csv", delimiter=",", dtype=str)

Step 3: Define the KNN Function

The knn function calculates the distances between a query point and all other points, then returns the K nearest neighbors.

def knn(df, X, Y, x, y, k):
    distances = np.sqrt(((X.astype(float) - x.astype(float)) ** 2) + ((Y.astype(float) - y.astype(float)) ** 2))
    dis_indexes = distances.argsort()[1:]  # Exclude the point itself
    return df[dis_indexes[:k]]

Step 4: Define the Prediction Function

The predict function determines the most common class among the K nearest neighbors and handles ties by averaging distances.

def predict(nbrs_list, x1, y1):
    varieties = nbrs_list[:, -1]
    unique, counts = np.unique(varieties, return_counts=True)
    count_vars = dict(zip(unique, counts))
    if len(unique) == 1:
        return unique[0]
    else:
        count_dict = Counter(count_vars.values())
        result = [key for key, value in count_vars.items()
                  if count_dict[value] > 1]
        if result:
            dict1 = {}
            for value in unique:
                vals = nbrs_list[nbrs_list[:, -1] == value]
                counter = 1
                distances = 0
                for val in vals:
                    distance = np.sqrt(((val[0].astype(float) - x1.astype(float)) ** 2) + 
                                       ((val[1].astype(float) - y1.astype(float)) ** 2))
                    distances += distance
                    counter += 1
                average_distance = distances / counter
                dict1[value] = average_distance
            return min(dict1, key=dict1.get)
        else:
            inverse = [(value, key) for key, value in count_vars.items()]
            return max(inverse)[1]

Step 5: Evaluate Different Values of K

We evaluate the performance of KNN for different values of K (from 2 to 10) and plot the results.

ks = {"k2": 0, "k3": 0, "k4": 0, "k5": 0, "k6": 0, "k7": 0, "k8": 0, "k9": 0, "k10": 0}
for k in ks:
    pk = []
    for i in range(1, main_df.shape[0]):
        neighbours = knn(main_df[1:], main_df[1:, 0], main_df[1:, 1], main_df[i, 0], main_df[i, 1], int(k[1:]))
        predicted_variety = predict(neighbours, main_df[i, 0], main_df[i, 1])
        pk.append(predicted_variety)
    ks[k] = sum(main_df[1:, -1] == pk)

plt.plot(["k2", "k3", "k4", "k5", "k6", "k7", "k8", "k9", "k10"], [ks[k] for k in ks])
plt.xlabel('Value of K')
plt.ylabel('Number of Correct Predictions')
plt.title('KNN Classification Accuracy for Different K Values')
plt.show()

Conclusion

K-Nearest Neighbors is a powerful and intuitive algorithm for classification tasks. By understanding its mechanics and implementing it with the Iris dataset, you can appreciate its effectiveness in real-world applications. Whether you’re a beginner in machine learning or looking to refine your skills, KNN is a great starting point.

Feel free to explore the code, modify the parameters, and see how KNN performs on different datasets! If you’d like to view the complete code, you can clone my repository from GitHub: Clone the KNN Model Demonstration Repository

要查看或添加评论，请登录

Hesam Alavi的更多文章

Building a Simple Telegram Bot to Fetch Cryptocurrency Prices with Python

2025年2月26日

Building a Simple Telegram Bot to Fetch Cryptocurrency Prices with Python

Telegram bots have become a popular way to automate tasks, provide services, and interact with users. In this article…
Understanding the F1 Score: A Deep Dive into Classification Metrics

2025年2月23日

Understanding the F1 Score: A Deep Dive into Classification Metrics

In the world of machine learning, evaluating model performance is crucial for ensuring that our algorithms are making…
Unlocking Decision-Making: An In-Depth Analysis of Entropy in Decision Trees

2025年2月23日

Unlocking Decision-Making: An In-Depth Analysis of Entropy in Decision Trees

Decision trees are a popular machine learning algorithm used for classification and regression tasks. They work by…

1 条评论
Building a Decision Tree from Scratch: Gini Impurity Explained with Python

2025年2月13日

Building a Decision Tree from Scratch: Gini Impurity Explained with Python

Decision Trees and Gini Impurity: A Fun Dive into Data Science ?? Hello, my fellow data enthusiasts! Buckle up because…
Understanding Regression Loss and Accuracy: A Deep Dive into Linear Regression

2025年1月27日

Understanding Regression Loss and Accuracy: A Deep Dive into Linear Regression

Introduction In the realm of data science and machine learning, regression analysis plays a pivotal role in predicting…
Building a Custom Linear Regression Model from Scratch in Python

2025年1月21日

Building a Custom Linear Regression Model from Scratch in Python

In this article, we will explore how to implement a simple linear regression model from scratch using Python. The goal…
Generate and Insert Massive Data into SQLite Databases with Ease

2025年1月1日

Generate and Insert Massive Data into SQLite Databases with Ease

In this article, we’ll dive into the process of generating and inserting large-scale data into an SQLite database…

2 条评论
Building a K-Means Algorithm in Python: A Step-by-Step Guide

2025年1月1日

Building a K-Means Algorithm in Python: A Step-by-Step Guide

K-Means Clustering is a popular unsupervised machine learning algorithm used for grouping data into clusters. It aims…
Building a Real-Time Surveillance System with Python: Automating Content Monitoring and Downloads

2024年12月8日

Building a Real-Time Surveillance System with Python: Automating Content Monitoring and Downloads

In today’s fast-paced digital world, automation has become a cornerstone for efficiency and scalability. Whether it’s…
Crafting a Seamless WYSIWYG Experience: Integrating CKEditor with React and Django

2024年10月16日

Crafting a Seamless WYSIWYG Experience: Integrating CKEditor with React and Django

In today’s digital age, providing a smooth and user-friendly content creation experience is essential for both…

1 条评论

See all articles

K-Nearest Neighbors Explained: A Guide to Classification Algorithms

Hesam Alavi

AI developer, backend developer, database designer

What is KNN?

Key Concepts

Implementing KNN with the Iris Dataset

Step 1: Import Libraries

Step 2: Load the Dataset

领英推荐

Step 3: Define the KNN Function

Step 4: Define the Prediction Function

Step 5: Evaluate Different Values of K

Conclusion

Hesam Alavi的更多文章

社区洞察

其他会员也浏览了

30 Python Libraries that I Often Use

Fuzzy Regression: A Generic, Model-free, Math-free Machine Learning Technique

A Practical Example for Improving ML Models with Multiple Linear Regression

AI_Part_5_K-NN

Big O Notation Explained As Simple As Possible

LangGraph: A Quick Start

How to Estimate Chance with Dice Rolls Using Convolutions and Recursion

Mastering ARIMA Models for Time Series Forecasting

Stock Price Prediction System

YOLO Inference with Docker via API: A Comprehensive Guide

What is KNN?

Key Concepts

Implementing KNN with the Iris Dataset

Step 1: Import Libraries

Step 2: Load the Dataset

领英推荐

Step 3: Define the KNN Function

Step 4: Define the Prediction Function

Step 5: Evaluate Different Values of K

Conclusion

Hesam Alavi的更多文章

Building a Simple Telegram Bot to Fetch Cryptocurrency Prices with Python

Understanding the F1 Score: A Deep Dive into Classification Metrics

Unlocking Decision-Making: An In-Depth Analysis of Entropy in Decision Trees

Building a Decision Tree from Scratch: Gini Impurity Explained with Python

Understanding Regression Loss and Accuracy: A Deep Dive into Linear Regression

Building a Custom Linear Regression Model from Scratch in Python

Generate and Insert Massive Data into SQLite Databases with Ease

Building a K-Means Algorithm in Python: A Step-by-Step Guide

Building a Real-Time Surveillance System with Python: Automating Content Monitoring and Downloads

Crafting a Seamless WYSIWYG Experience: Integrating CKEditor with React and Django

社区洞察

其他会员也浏览了

30 Python Libraries that I Often Use

Fuzzy Regression: A Generic, Model-free, Math-free Machine Learning Technique

A Practical Example for Improving ML Models with Multiple Linear Regression

AI_Part_5_K-NN

Big O Notation Explained As Simple As Possible

LangGraph: A Quick Start

How to Estimate Chance with Dice Rolls Using Convolutions and Recursion

Mastering ARIMA Models for Time Series Forecasting

Stock Price Prediction System

YOLO Inference with Docker via API: A Comprehensive Guide