For data scientists, ML engineers, product managers, and all practitioners alike.
How to evaluate the quality of a classification model? In this guide, we break down different machine learning metrics for binary and multi-class problems.
What you will learn in this guide:
Here is what makes this guide different:
There is no need to read the guide cover-to-cover: each article is self-contained, and you can read it individually.
A confusion matrix is a table that summarizes a classification model’s performance. It shows the number of correct predictions (true positives and negatives) and model errors (false positives and negatives). This chapter explains how to create a confusion matrix for binary and multi-class models and which metrics you can derive from it.
Accuracy reflects the overall correctness of the model. Precision shows how well the model detects the positive class. Recall shows the share of positive class detected by the model. This chapter explains how to choose an appropriate metric considering the use case and the costs of errors.
There are different ways to calculate accuracy, precision, and recall for multi-class classification. You can calculate metrics by each class or use macro- or micro-averaging. This chapter explains the difference between the options and how they behave in important corner cases.
In probabilistic machine learning problems, the model output is not a label but a score. You must then set a decision threshold to assign a specific label to a prediction. This chapter explains how to choose an optimal classification threshold to balance precision and recall.
The ROC curve shows the performance of a binary classifier with varying decision thresholds. It plots the True Positive rate against the False Positive rate. The resulting area under the curve (ROC AUC score) is a common metric to evaluate the classifier’s quality. This chapter explains how to compute and interpret ROC AUC.