What evaluation approaches would you work to deal with the effectiveness of a machine learning model
why need of evaluate machine learning model ?
Machine learning continues to be an increasingly integral component of our lives, whether we’re applying the techniques to research or business problems. Machine learning models ought to be able to give accurate predictions in order to create real value for a given organization.
Methods for evaluating a model’s performance are divided into 2 categories: namely,?holdout?and?Cross-validation.This is because our model will simply remember the whole training set, and will therefore always predict the correct label for any point in the training set. This is known as?overfitting.
Holdout
The purpose?of holdout evaluation is to test a model on different data than it was trained on. This provides an unbiased estimate of learning performance.
In this method, the dataset is?randomly?divided into three subsets:
Cross-Validation
Cross-validation?is a technique that involves partitioning the original observation dataset into a training set, used to train the model, and an independent set used to evaluate the analysis.Types of croos validations :
Model Evaluation Metrics
Model evaluation metrics are required to quantify model performance. The choice of evaluation metrics depends on a given machine learning task (such as classification, regression, ranking, clustering, topic modeling, among others).
Classification Metrics
In this section we will review some of the metrics used in classification problems, namely:
Classification Accuracy
Classification predictive modeling involves predicting a class label given examples in a problem domain.
Accuracy and its complement error rate are the most frequently used metrics for estimating the performance of learning systems in classification problems.
Classification accuracy?involves first using a classification model to make a prediction for each example in a test dataset. The predictions are then compared to the known labels for those examples in the test set. Accuracy is then calculated as the proportion of examples in the test set that were predicted correctly, divided by all predictions that were made on the test set.
Conversely, the error rate can be calculated as the total number of incorrect predictions made on the test set divided by all predictions made on the test set.
The accuracy and error rate are complements of each other, meaning that we can always calculate one from the other. For example:
领英推荐
Accuracy Fails for Imbalanced Classification
When the class distribution is slightly skewed, accuracy can still be a useful metric. When the skew in the class distributions are severe, accuracy can become an unreliable measure of model performance.
Confusion matrix
When performing classification predictions, there’s four types of outcomes that could occur.
Logarithmic Loss
Logarithmic loss (logloss) measures the performance of a classification model where the prediction input is a probability value between 0 and 1. Log loss increases as the predicted probability diverges from the actual label. The goal of machine learning models is to minimize this value. As such, smaller logloss is better, with a perfect model having a log loss of 0.
Area under Curve (AUC)
Area under ROC Curve is a performance metric for measuring the ability of a?binary classifier?to discriminate between positive and negative classes.
In the example above, the AUC is relatively close to 1 and greater than 0.5. A perfect classifier will have the ROC curve go along the Y axis and then along the X axis.
F-Measure
F-measure (also F-score) is a measure of a test’s accuracy that considers both the?precision?and the?recall?of the test to compute the score. Precision is the number of correct positive results divided by the total predicted positive observations. Recall, on the other hand, is the number of correct positive results divided by the number of all relevant samples (total actual positives).
Regression Metrics
In this section we review 2 of the most common metrics for evaluating regression problems namely, Root Mean Squared Error and Mean Absolute Error.
The Mean Absolute Error (or MAE) is the sum of the absolute differences between predictions and actual values. On the other hand, Root Mean Squared Error (RMSE) measures the average magnitude of the error by taking the square root of the average of squared differences between prediction and actual observation.
Conclusion
Ideally, the estimated performance of a model tells us how well it performs on unseen/new data. Making predictions on future new data is often the main problem we want to solve. It’s important to understand the context before choosing a metric because each machine learning model tries to solve a problem with a different objective using a different dataset.
“ I’m Baishalini Sahu working as a data scientist specializing in Artificial intelligence and machine learning, message behind this article has attempted to explain the common evaluation metrics for classification and regression machine learning problems, providing short Python snippets to show how they can be implemented and what are the mathmatical formulas used behind it’’
Hiring Oracle BRM developers, preferably from Bangalore connect me on 9845684794 or [email protected]
3 年way to learn and unlearn :) ??
Telecommunications professional with experience in IP transport, RAN transport, and project management domains. Azure cloud certified with automation & advanced MS-Excel skills.
3 年Nice one.
AI/ML Innovation @ AWS ◆ Machine Learning Engineer ◆ AI Consultant
3 年Great reading, Baishalini Sahu, keep it coming!!!