The precision-recall plot is a model-wide measure for evaluating binary classifiers and closely related to the ROC plot. We’ll cover the basic concept and several important aspects of the precision-recall plot through this page.
For those who are not familiar with the basic measures derived from the confusion matrix or the basic concept of model-wide evaluation, we recommend reading the following two pages.
For those who are not familiar with the basic concept of the ROC plot, we also recommend reading the following page.
Precision-recall shows pairs of recall and precision values
The precision-recall plot is a model-wide evaluation measure that is based on two basic evaluation measures – recall and precision. Recall is a performance measure of the whole positive part of a dataset, whereas precision is a performance measure of positive predictions.
The precision-recall plot uses recall on the x-axis and precision on the y-axis. Recall is identical with sensitivity, and precision is identical with positive predictive value.
A naïve way to calculate a precision-recall curve by connecting precision-recall points
A precision-recall point is a point with a pair of x and y values in the precision-recall space where x is recall and y is precision. A precision-recall curve is created by connecting all precision-recall points of a classifier. Two adjacent precision-recall points can be connected by a straight line.
An example of making a precision-recall curve
We’ll show a simple example to make a precision-recall curve by connecting several precision-recall points. Let us assume that we have calculated recall and precision values from multiple confusion matrices for four different threshold values.
We first added four points that matches with the pairs of recall and precision values and then connected the points to create a precision-recall curve.
3 important aspects of making an accurate precision-recall curve
Unlike the ROC plot, it is less straight-forward to calculate accurate precision-recall curves since the following three aspects need to be considered.
- Estimating the first point from the second point
- AUC cannot be calculated without the first point
- Non-linear interpolation between two points
- curves with linear interpolation tend to be inaccurate for small datasets, imbalanced datasets, and datasets with many tied scores
- Calculating the end point
- the end point should not be extended to the top right (1.0, 1.0) nor the bottom right (1.0, 0.0) except in the case that observed labels are either all positives or all negatives
We’ll show an example of these aspect by creating a precision-recall curve.
An example of recall and precision pairs
We use four pairs of recall and precision values that are calculated from four threshold values.
We explain the three aspects by using the three pairs of consecutive points.
- Points 1-2: estimating the first point
- Points 2-3: non-linear interpolation
- Points 3-4: calculating the end point
Points 1-2: Estimating the first point from the second point
The first point should be estimated from the second point because the precision value is undefined when the number of positive predictions is 0. This undefined result is easily explained by the equation of precision as PREC = TP / (FP + TP) where (FP + TP) is the number of positive predictions.
There are two cases of estimating the first point depending on the true positives of the second point.
- The number of true positives (TP) of the second point is 0
- The number of true positives (TP) of the second point is not 0
Case 1: TP is 0
Since the second point is (0.0, 0.0) for this case, it is easy to estimate the first point, which is also (0.0, 0.0). In other words, the first point is not necessary to be estimated for this case.
Case 2: TP is not 0
This is also the case for our example, and the second point is (0.5, 0.667). We can estimate the first point by drawing a horizontal line from the second point to the y-axis. Hence, the first point is estimated as (0.0, 0.667).
Points 2-3: Non-linear interpolation between two points
Davis and Goadrich proposed the non-linear interpolation method of precision-recall points in their article (Davis2006). The equation described in their article is
where y is precision and x can be any value between 0 and |TPB – TPA|. A smooth curves can be created by calculating many intermediate points between two points A and B.
An intermediate point 2.5 for points 2-3
Let us assume the second point has 2 TPs and 1 FP and the third point has 3 TPs and 2 FPs.
- Point 2: (0.5, 0.667)
- Point 3: (0.75, 0.6)
|Point 2||Point 3|
|TP (# of true positives)||2||3|
|FP (# of false positives)||1||2|
We then define the intermediate point 2.5 as the middle point where recall is 0.625. We show that the precision value of point 2.5 can be different for linear and non-linear interpolation.
Since point 2.5 is the center point of the second and the third points, the precision value is 0.633.
- Point 2.5: (0.625, 0.633)
We calculate the precision value by
with the following values.
- TPpoint2: 2
- FPpoint2: 1
- TPpoint3: 3
- FPpoint3: 2
- x: 0.5
The calculated precision value is 0.625.
- Point 2.5: (0.625, 0.625)
Points 3-4: Calculating the end point
The end point of the precision-recall curve is always (P / (P + N), 1.0). For instance, the end point is (0.5, 1.0) from (4 / (4 + 4), 1.0) when P is 4, and N is 4. Subsequently, the end position and the previous position should be connected by non-linear interpolation.
Interpretation of precision-recall curves
Similar to a ROC curve, it is easy to interpret a precision-recall curve. We use several examples to explain how to interpret precision-recall curves.
A precision-recall curve of a random classifier
A classifier with the random performance level shows a horizontal line as P / (P + N). This line separates the precision-recall space into two areas. The separated area above the line is the area of good performance levels. The other area below the line is the area of poor performance.
A precision-recall curve of a perfect classifier
A classifier with the perfect performance level shows a combination of two straight lines – from the top left corner (0.0, 1.0) to the top right corner (1.0, 1.0) and further down to the end point (1.0, P / (P + N)).
Precision-recall curves for multiple models
It is easy to compare several classifiers in the precision-recall plot. Curves close to the perfect precision-recall curve have a better performance level than the ones closes to the baseline. In other words, a curve above the other curve has a better performance level.
Noisy curves for small recall values
A precision-recall curve can be noisy (a zigzag curve frequently going up and down) for small recall values. Therefore, precision-recall curves tend to cross each other much more frequently than ROC curves especially for small recall values. Comparisons with multiple classifiers can be difficult if the curves are too noisy.
AUC (Area Under the precision-recall Curve) score
Similar to ROC curves, the AUC (the area under the precision-recall curve) score can be used as a single performance measure for precision-recall curves. As the name indicates, it is an area under the curve calculated in the precision-recall space. An approximate but easy way to calculate the AUC score is using the trapezoidal rule, which is adding up all trapezoids under the curve.
Although the theoretical range of AUC score is between 0 and 1, the actual scores of meaningful classifiers are greater than P / (P + N), which is the AUC score of a random classifier.
One-to-one relationship between ROC and precision-recall points
Davis and Goadrich introduced the one-to-one relationship between ROC and precision-recall points in their article (Davis2006). In principle, one point in the ROC space always has a corresponding point in the precision-recall space, and vice versa. This relationship is also closely related with the non-linear interpolation of two precision-recall points
A ROC curve and a precision-recall curve should indicate the same performance level for a classifier. Nevertheless, they usually appear to be different, and even interpretation can be different.
In addition, the AUC scores are different between ROC and precision-recall for the same classifier.
3 important characteristics of the precision-recall plot
Among several known characteristics of the precision-recall plot, three of them are important to consider for accurate the precision-recall analysis.
- Interpolation between two precision-recall points is non-linear.
- The ratio of positives and negatives defines the baseline.
- A ROC point and a precision-recall point always have a one-to-one relationship.
These characters are also important when the plot is applied to imbalanced datasets. For more details about the precision-recall plot with imbalanced datasets, we recommend reading the following pages.