To illustrate how to calculate the F1 score, let's consider a simple example. Suppose you have a machine learning model that predicts whether an email is spam or not. The model makes the following predictions on a test set of 10 emails:
Actual: [0, 1, 0, 0, 1, 0, 1, 1, 0, 1]
Predicted: [0, 1, 1, 0, 0, 0, 1, 1, 0, 0]
Here, 0 means not spam and 1 means spam. To calculate the precision and recall, we need to count the true positives (TP), false positives (FP), and false negatives (FN):
TP = 3 (the model correctly predicts spam for emails 2, 7, and 8)
FP = 1 (the model incorrectly predicts spam for email 3)
FN = 2 (the model incorrectly predicts not spam for emails 5 and 10)
Then, we can plug in the values into the formulas:
Precision = TP / (TP + FP) = 3 / (3 + 1) = 0.75
Recall = TP / (TP + FN) = 3 / (3 + 2) = 0.6
Finally, we can calculate the F1 score: F1 = 2 * (0.75 * 0.6) / (0.75 + 0.6) = 0.667 The F1 score is 0.667, which is lower than both precision and recall. This reflects the trade-off between them: the model is more precise than recall, but it misses some spam emails.