Accuracy of Machine Vision Systems: A Guide to Evaluating Performance

Accuracy of Machine Vision Systems: A Guide to Evaluating Performance

Introduction

When evaluating the performance of a machine vision classifier, it is important to consider a variety of metrics. These metrics can provide a more complete picture of the classifier's performance and help identify areas for improvement. In this paper, we will discuss the importance of these metrics and how they can be used to evaluate and improve the performance of a classifier. We will provide examples and explain how to calculate as well as measure these metrics for a machine vision classifier.

For the purposes of this paper, we will assume that the machine vision system is a simple binary classifier, which classifies parts in a production line as either "BAD" (defect) or "GOOD" (no defect) among a set of products being inspected.

Definitions

Let us define the terms that we will be using to measure the system performance of the vision system.

Confusion Matrix

A confusion matrix is a table used to evaluate the performance of a classifier. It compares the predicted classes to the actual classes for a set of inspection data. The confusion matrix is divided into four quadrants, each representing the number of true positives, true negatives, false positives, and false negatives. The diagonal elements represent correct predictions, while the off-diagonal elements represent incorrect predictions. A confusion matrix is a useful tool for understanding the classifier's strengths and weaknesses and identifying areas for improvement.

  • True positive (TP): the number of times the classifier correctly predicted acceptance
  • True negative (TN): the number of times the classifier correctly predicted rejection
  • False positive (FP): the number of times the classifier incorrectly predicted acceptance
  • False negative (FN): the number of times the classifier incorrectly predicted rejection

We will assume for illustration that we have inspected 1,000 parts consisting of 990 GOOD parts and 10 BAD parts. Let us plug these numbers into the confusion matrix below

No alt text provided for this image
Confusion Matrix illustrating results of a Machine Vision System

We can now see that the vision system has incorrectly predicted 6 of the 1000 predictions as:

  • False Positives (BAD classified as GOOD) = 2?
  • False Negatives (GOOD classified as BAD) = 4

This data in the confusion matrix can be further used to calculate important evaluation metrics such as accuracy, precision, recall, and F1 score. These are the metrics you need to measure to clearly understand the accuracy of the vision system.

Accuracy

Accuracy refers to the percentage of objects that are correctly classified out of a subset of inspected parts.?

Accuracy = (True Positive + True Negative) / (Total Inspections)        

In the above example out of 1000 parts inspected, 994 of them are correctly classified.?

So the accuracy of the system can be calculated as?

Accuracy = (986 + 8) / 1000 = 99.4%.?

The inverse of accuracy is error, which is the percentage of misclassified parts. In the example above, the error would be 0.6%. However, this simple method of calculating error may not reflect the desired performance of a vision system. To get a more accurate understanding of the system's performance, it is necessary to understand a few other metrics.?

Precision

Precision is a metric that measures the accuracy of a classifier's predictions. It is calculated as the number of correct predictions of a class (GOOD or BAD) divided by the total number of that class's predictions made by the classifier.?

To calculate precision for the BAD class (defects), you would use the formula:


Precision_BAD = true negative / (true negative + false negative)        

So for the above example, it would be Precision_BAD = 8 / (8+4) = 0.667

In other words, it is the proportion of negative predictions that are actually correct. Having a low Precision_BAD score indicates a higher number of false rejections, leading to higher scrap, rework, or reinspection costs.?

To calculate precision for the accepted parts (GOOD) you would use the formula:


Precision_GOOD = true positive / (true positive + false positive)        

For the above example, it would be Precision_GOOD = 986 / (986+2) = 0.998

In other words, the Precision_GOOD? score indicates the vision system’s ability to predict acceptance. Having a low Precision_GOOD score indicates that a higher number of bad parts are being accepted by the system leading to poor-quality products leaving the line.?

Precision is only one of many evaluation metrics that can be used to measure the performance of a classifier. Other metrics include recall and F1 score which we will describe in the proceeding sections. These metrics can provide a more complete picture of the classifier's performance and help identify areas for improvement.

Recall?

Recall is a metric that measures the ability of a classifier to identify all instances of a particular class. In the context of a machine vision classifier for identifying defects, recall would measure the proportion of actual defects that the classifier correctly identified.

To calculate recall for the positive class (e.g., GOOD)



Recall_GOOD = true positive / (true positive + false negative)        

for the above example: Recall_GOOD ?=? 986/(986+4) = 0.996

Similarly?


Recall_BAD = true negative / (true negative + false positive)        

Recall_BAD = 8 / (8+2) = 0.8

Here again, a high recall score means that the classifier is identifying most of the defects, while a low recall score may indicate that the classifier is missing a significant number of defects.

Precision and Recall are the main metrics that provide insights into the accuracy of the vision system, with specific insights into the types of errors.?

?F1 score

?The F1 score is a metric that combines precision and recall into a single score, with a higher score indicating better performance. It is used to evaluate the performance of a classifier and is calculated by taking the harmonic mean of precision and recall. The F1 score is useful for balancing precision and recall and is often used in combination with other evaluation metrics such as accuracy, precision, and recall.

?To calculate the F1 score for both classes in a classification problem (e.g., defects and no defects in the example above), you would need to calculate the F1 score separately for each class.

?To calculate the F1 score for the positive class (GOOD), you would use the formula:



F1_GOOD/BAD = 2 * (precision_GOOD/BAD * recall_GOOD/BAD) / (precision_GOOD/BAD + recall_GOOD/BAD)        

?For the above example it would be:

F1_GOOD? = 2 * (0.998 * 0.996) / (0.998 + 0.996) = 0.997

To calculate the F1 score for the negative/defect class (BAD), you would use the same formula, but with the precision and recall values for that class.

F1_BAD? = 2 * (0.667 * 0.8) / (0.667 + 0.8) = 0.727

Gauge R&R?

Gauge repeatability and reproducibility (Gauge R&R) is a statistical analysis used to assess the reliability and consistency of measurement systems in manufacturing and quality control. to ensure that measurements taken with a particular instrument or method are accurate and consistent. In the case of a Vision System, a Gauge R&R study can be conducted by inspecting a set of parts or samples multiple times using the vision system. The results are then analyzed to determine the repeatability and reproducibility of the system.

Repeatability is a measure of the consistency of measurements taken by the vision system on the same part.?

Reproducibility is a measure of the consistency of measurements taken by different operators with the vision systems on the same set of parts. For example, if a machine vision system is used to inspect parts for defects, reproducibility would refer to the consistency of the system's output when processed by different operators. If the system consistently classifies the part as "defect" or "no defect" regardless of the operator, it has high reproducibility. If the system's output varies significantly depending on the operator, it has low reproducibility.

Low Reproducibility is a reflection of the usage of the vision system and indicates inconsistencies in perhaps material handling or operator usage.?

Reproducibility would be less critical of a metric when it comes to measuring system efficiency.

Practical Considerations

In practice, Machine Vision systems are used to inspect various variants of parts as well as various types of defects, and so the overall measurement metrics need to factor in these variations as well.

Say for example we are inspecting plastic bins with 2 variants (Big and Small) with 3 different defect types (ex: Scratch, Dents, and Holes).

No alt text provided for this image
Example of a Vision application to inspect plastic bins for defects

So here we have 8 different combinations of inspected parts.?

So one can appreciate that System Performance cannot be measured by a single number (aka Accuracy). However on the other extreme, measuring all the above metrics Accuracy, Precision_GOOD, Recall_BAD and Repeatability leads us to an extremely complex set of metrics to track and measure for each of the 8 combinations of parts.

So we need to consider a balanced subset of metrics for the variant/defect mix to come up with the optimal measurement system.

In a subsequent article I will cover the measurement methodology we can practically follow for an optimal method of measuring system efficiency

Summary

Accuracy is a simple but important metric that measures the overall correctness of a classifier. It is calculated as the number of correctly classified items divided by the total number of items.

Precision is a metric that measures the accuracy of a classifier's positive predictions. It is calculated as the number of true positive predictions divided by the total number of positive predictions made by the classifier.

Recall is a metric that measures the ability of a classifier to identify all instances of a particular class. It is calculated as the number of true positive predictions divided by the total number of actual instances of the class.

The F1 score is a metric that combines precision and recall into a single score. It is calculated as the harmonic mean of precision and recall, with a higher score indicating better performance.

All the above metrics give you specific insights into the system performance of a Machine Vision System. However, we also need to consider the set of inspections one applies these metrics (ie. the test sample). We need to include enough variations considering the different SKUs/Variants, as well as the types of defects that normally would occur. By doing so we can arrive at optimal measurement systems leading to a very high efficacy for your Vision System and meeting your Quality Control objectives.

Zoltán Patyik

Curious Project Engineer at Continental

1 年

Hi Raghava Kashyapa, I really liked this article, specially the example calculation with the Confusion Matrix. Is there any standard available which can be used as acceptance criteria for Machine Vision with Confusion Matrix? I mean, is that stated somewhere what is the acceptance criteria for Accuracy, Precision, Recall, F1 Score considering Machine Vision application? Otherwise these values we have to set for ourself. Regards, Zoltán

回复
MANOJ TEMBE

Founder & Managing Director

2 年

Great piece! I love

要查看或添加评论,请登录

Raghava Kashyapa的更多文章

社区洞察

其他会员也浏览了