What evaluation approaches would you work to deal with the effectiveness of a machine learning model

What evaluation approaches would you work to deal with the effectiveness of a machine learning model

why need of evaluate machine learning model ?

Machine learning continues to be an increasingly integral component of our lives, whether we’re applying the techniques to research or business problems. Machine learning models ought to be able to give accurate predictions in order to create real value for a given organization.

Methods for evaluating a model’s performance are divided into 2 categories: namely,?holdout?and?Cross-validation.This is because our model will simply remember the whole training set, and will therefore always predict the correct label for any point in the training set. This is known as?overfitting.

Holdout

The purpose?of holdout evaluation is to test a model on different data than it was trained on. This provides an unbiased estimate of learning performance.

In this method, the dataset is?randomly?divided into three subsets:

  1. Training set?is a subset of the dataset used to build predictive models.
  2. Validation set?is a subset of the dataset used to assess the performance of the model built in the training phase. It provides a test platform for fine-tuning a model’s parameters and selecting the best performing model. Not all modeling algorithms need a validation set.
  3. Test set, or unseen data, is a subset of the dataset used to assess the likely future performance of a model. If a model fits to the training set much better than it fits the test set, overfitting is probably the cause.

Cross-Validation

Cross-validation?is a technique that involves partitioning the original observation dataset into a training set, used to train the model, and an independent set used to evaluate the analysis.Types of croos validations :

  • Leave p-out?cross-validation: …
  • Leave-one-out?cross-validation: …
  • Holdout?cross-validation: …
  • k-fold?cross-validation: …
  • Repeated random subsampling?validation: …
  • Stratified k-fold?cross-validation: …
  • Time Series?cross-validation:

Model Evaluation Metrics

Model evaluation metrics are required to quantify model performance. The choice of evaluation metrics depends on a given machine learning task (such as classification, regression, ranking, clustering, topic modeling, among others).

Classification Metrics

In this section we will review some of the metrics used in classification problems, namely:

  • Classification Accuracy
  • Confusion matrix
  • Logarithmic Loss
  • Area under curve (AUC)
  • F-Measure

Classification Accuracy

Classification predictive modeling involves predicting a class label given examples in a problem domain.

Accuracy and its complement error rate are the most frequently used metrics for estimating the performance of learning systems in classification problems.


Classification accuracy?involves first using a classification model to make a prediction for each example in a test dataset. The predictions are then compared to the known labels for those examples in the test set. Accuracy is then calculated as the proportion of examples in the test set that were predicted correctly, divided by all predictions that were made on the test set.

  • Accuracy = Correct Predictions / Total Predictions

Conversely, the error rate can be calculated as the total number of incorrect predictions made on the test set divided by all predictions made on the test set.

  • Error Rate = Incorrect Predictions / Total Predictions

The accuracy and error rate are complements of each other, meaning that we can always calculate one from the other. For example:

  • Accuracy = 1 — Error Rate
  • Error Rate = 1 — Accuracy

Accuracy Fails for Imbalanced Classification

When the class distribution is slightly skewed, accuracy can still be a useful metric. When the skew in the class distributions are severe, accuracy can become an unreliable measure of model performance.

Confusion matrix

No alt text provided for this image

When performing classification predictions, there’s four types of outcomes that could occur.

  • True positives?are when you predict an observation belongs to a class and it actually does belong to that class.
  • True negatives?are when you predict an observation does not belong to a class and it actually does not belong to that class.
  • False positives?occur when you predict an observation belongs to a class when in reality it does not.
  • False negatives?occur when you predict an observation does not belong to a class when in fact it does.

Logarithmic Loss

No alt text provided for this image

Logarithmic loss (logloss) measures the performance of a classification model where the prediction input is a probability value between 0 and 1. Log loss increases as the predicted probability diverges from the actual label. The goal of machine learning models is to minimize this value. As such, smaller logloss is better, with a perfect model having a log loss of 0.

Area under Curve (AUC)

Area under ROC Curve is a performance metric for measuring the ability of a?binary classifier?to discriminate between positive and negative classes.

No alt text provided for this image


In the example above, the AUC is relatively close to 1 and greater than 0.5. A perfect classifier will have the ROC curve go along the Y axis and then along the X axis.


F-Measure

No alt text provided for this image

F-measure (also F-score) is a measure of a test’s accuracy that considers both the?precision?and the?recall?of the test to compute the score. Precision is the number of correct positive results divided by the total predicted positive observations. Recall, on the other hand, is the number of correct positive results divided by the number of all relevant samples (total actual positives).

Regression Metrics

In this section we review 2 of the most common metrics for evaluating regression problems namely, Root Mean Squared Error and Mean Absolute Error.

No alt text provided for this image

The Mean Absolute Error (or MAE) is the sum of the absolute differences between predictions and actual values. On the other hand, Root Mean Squared Error (RMSE) measures the average magnitude of the error by taking the square root of the average of squared differences between prediction and actual observation.

Conclusion

Ideally, the estimated performance of a model tells us how well it performs on unseen/new data. Making predictions on future new data is often the main problem we want to solve. It’s important to understand the context before choosing a metric because each machine learning model tries to solve a problem with a different objective using a different dataset.


“ I’m Baishalini Sahu working as a data scientist specializing in Artificial intelligence and machine learning, message behind this article has attempted to explain the common evaluation metrics for classification and regression machine learning problems, providing short Python snippets to show how they can be implemented and what are the mathmatical formulas used behind it’’


Lakshmi - Hiring Oracle BRM

Hiring Oracle BRM developers, preferably from Bangalore connect me on 9845684794 or [email protected]

3 年

way to learn and unlearn :) ??

Ajay M.

Telecommunications professional with experience in IP transport, RAN transport, and project management domains. Azure cloud certified with automation & advanced MS-Excel skills.

3 年

Nice one.

Milan McGraw, Deep Learning Engineering

AI/ML Innovation @ AWS ◆ Machine Learning Engineer ◆ AI Consultant

3 年

Great reading, Baishalini Sahu, keep it coming!!!

要查看或添加评论,请登录

Baishalini Sahu的更多文章

社区洞察

其他会员也浏览了