The most common evaluation metrics for classification problems are accuracy, precision, recall, and F1-score. However, these metrics may not be suitable for imbalanced data, as they can be misleading or insensitive to the minority class. For instance, accuracy can be high even if the model predicts the majority class for all instances, while precision and recall can vary depending on the threshold or cutoff value. F1-score is a harmonic mean of precision and recall, but it may not capture the trade-off between them adequately. Alternative evaluation metrics for imbalanced data include ROC curve and AUC, which plots the true positive rate (TPR) versus the false positive rate (FPR) for different threshold values and measures the area under the curve respectively. A higher AUC indicates a better model that can distinguish between the classes. Precision-recall curve and average precision is another metric that plots the precision versus the recall for different threshold values and measures the area under the curve respectively. A higher average precision indicates a better model that can predict the minority class accurately. Additionally, Cohen's kappa is a measure of agreement between the model's predictions and the actual labels, adjusted for chance. A higher kappa indicates a better model that can predict both classes reliably.