How good is your model?

How good is your model?

Metrics for classification

The performance of k-NN classifier based on its accuracy. However, accuracy is not always an informative metric. We will dive more deeply into evaluating the performance of binary classifiers by computing a confusion matrix and generating a classification report.

Classification metrics

● Measuring model performance with accuracy:

● Fraction of correctly classified samples

● Not always a useful metric

Class imbalance example: Emails

● Spam classification

● 99% of emails are real; 1% of emails are spam

● Could build a classifier that predicts ALL emails as real

● 99% accurate!

● But horrible at actually classifying spam

● Fails at its original purpose

● Need more nuanced metrics

You may have noticed that the classification report consisted of three rows, and an additional support column. The support gives the number of samples of the true response that lie in that class - the support was the number of Republicans or Democrats in the test set on which the classification report was computed. The precision, recall, and f1-score columns, then, gave the respective metrics for that particular class.

Here, you'll work with the PIMA Indians dataset obtained from the UCI Machine Learning Repository. The goal is to predict whether or not a given female patient will contract diabetes based on features such as BMI, age, and number of pregnancies. Therefore, it is a binary classification problem. A target value of 0 indicates that the patient does not have diabetes, while a value of 1 indicates that the patient does have diabetes. The dataset has been pre-processed to deal with missing values.

The dataset has been loaded into a DataFrame df and the feature and target variable arrays X and y have been created for you. In addition, sklearn.model_selection.train_test_split and sklearn.neighbors.KNeighborsClassifier have already been imported.

We will train a k-NN classifier to the data and evaluate its performance by generating a confusion matrix and classification report.

Import classification_report and confusion_matrix from sklearn.metrics.

Create training & testing sets with 40% of data used for testing. Use a random state of 42.

Instantiate a k-NN classifier with 6 neighbors, fit it to the training data, and predict the labels of the test set.

Compute and print the confusion matrix and classification report using the confusion_matrix() and classification_report() functions.

The precisionrecall, and f1-score columns, then, gave the respective metrics for that particular class. By analyzing the confusion matrix and classification report, you can get a much better understanding of your classifier's performance.




Bigu Haider

Working Benefits and organizing. In Taxi Workers Alliance

6 年

Abu Chowdhury should get noble prize

回复

要查看或添加评论,请登录

Abu Chowdhury, PMP?, MSFE, MSCS, BSEE的更多文章

社区洞察

其他会员也浏览了