Introduction to the precision-recall plot

The precision-recall plot is a model-wide measure for evaluating binary classifiers and closely related to the ROC plot. We’ll cover the basic concept and several important aspects of the precision-recall plot through this page.

For those who are not familiar with the basic measures derived from the confusion matrix or the basic concept of model-wide evaluation, we recommend reading the following two pages.

For those who are not familiar with the basic concept of the ROC plot, we also recommend reading the following page.

Precision-recall shows pairs of recall and precision values

The precision-recall plot is a model-wide evaluation measure that is based on two basic evaluation measures – recall and precision. Recall is a performance measure of the whole positive part of a dataset, whereas precision is a performance measure of positive predictions.

Four ovals respectively represent observed labels, four outcomes, recall, and precision.
A dataset has two labels (P and N), and a classifier separates the dataset into four outcomes – TP, TN, FP, FN. The precision-recall plot is based on two basic measures – recall and precision that are calculated from the four outcomes.

The precision-recall plot uses recall on the x-axis and precision on the y-axis. Recall is identical with sensitivity, and precision is identical with positive predictive value.

A naïve way to calculate a precision-recall curve by connecting precision-recall points

A precision-recall point is a point with a pair of x and y values in the precision-recall space where x is recall and y is precision. A precision-recall curve is created by connecting all precision-recall points of a classifier. Two adjacent precision-recall points can be connected by a straight line.

An example of making a precision-recall curve

We’ll show a simple example to make a precision-recall curve by connecting several precision-recall points. Let us assume that we have calculated recall and precision values from multiple confusion matrices for four different threshold values.

Threshold Recall Precision
1 0.0 0.75
2 0.25 0.25
3 0.625 0.625
4 1.0 0.5

We first added four points that matches with the pairs of recall and precision values and then connected the points to create a precision-recall curve.

A Precision-Recall curve and four Precision-Recall points.
The plot shows a precision-recall curve connecting four precision-recall points.

3 important aspects of making an accurate precision-recall curve

Unlike the ROC plot, it is less straight-forward to calculate accurate precision-recall curves since the following three aspects need to be considered.

  1. Estimating the first point from the second point
    • AUC cannot be calculated without the first point
  2. Non-linear interpolation between two points
    • curves with linear interpolation tend to be inaccurate for small datasets, imbalanced datasets, and datasets with many tied scores
  3. Calculating the end point
    • the end point should not be extended to the top right (1.0, 1.0) nor the bottom right (1.0, 0.0) except in the case that observed labels are either all positives or all negatives

We’ll show an example of these aspect by creating a precision-recall curve.

An example of recall and precision pairs

We use four pairs of recall and precision values that are calculated from four threshold values.

Point Threshold Recall Precision
1 1 0
2 2 0.5 0.667
3 3 0.75 0.6
4 4 1 0.5

We explain the three aspects by using the three pairs of consecutive points.

  1. Points 1-2: estimating the first point
  2. Points 2-3: non-linear interpolation
  3. Points 3-4: calculating the end point

Points 1-2: Estimating the first point from the second point

The first point should be estimated from the second point because the precision value is undefined when the number of positive predictions is 0. This undefined result is easily explained by the equation of precision as PREC = TP / (FP + TP) where (FP + TP) is the number of positive predictions.

There are two cases of estimating the first point depending on the true positives of the second point.

  1. The number of true positives (TP) of the second point is 0
  2. The number of true positives (TP) of the second point is not 0

Case 1: TP is 0

Since the second point is (0.0, 0.0) for this case, it is easy to estimate the first point, which is also (0.0, 0.0). In other words, the first point is not necessary to be estimated for this case.

Case 2: TP is not 0

This is also the case for our example, and the second point is (0.5, 0.667). We can estimate the first point by drawing a horizontal line from the second point to the y-axis. Hence, the first point is estimated as (0.0, 0.667).

First two Precision-Recall points.
Drawing a horizontal line from the second position to the y-axis to estimate the first point.

Points 2-3: Non-linear interpolation between two points

Davis and Goadrich proposed the non-linear interpolation method of precision-recall points in their article (Davis2006). The equation described in their article is

\mathrm{y = \displaystyle \cfrac{TP_A + x}{TP_A + x + FP_A + \cfrac{FP_B - FP_A}{TP_B - TP_A} \cdot x}}

where y is precision and x can be any value between 0 and |TPB – TPA|. A smooth curves can be created by calculating many intermediate points between two points A and B.

Non-linear interpolation of two Precision-Recall plots.
Two precision-recall points are connected by non-linearly. The blue dot line shows a straight line between points 2 and 3, whereas the red solid curve shows the correct non-linear interpolation between them.

An intermediate point 2.5 for points 2-3

Let us assume the second point has 2 TPs and 1 FP and the third point has 3 TPs and 2 FPs.

  • Point 2: (0.5, 0.667)
  • Point 3: (0.75, 0.6)
Point 2 Point 3
TP (# of true positives) 2 3
FP (# of false positives) 1 2
Recall 0.5 0.75
Precision 0.667 0.6

We then define the intermediate point 2.5 as the middle point where recall is 0.625. We show that the precision value of point 2.5 can be different for linear and non-linear interpolation.

Linear interpolation

Since point 2.5 is the center point of the second and the third points, the precision value is 0.633.

  • Point 2.5: (0.625, 0.633)
Non-linear interpolation

We calculate the precision value by

\mathrm{\displaystyle \cfrac{TP_{point2} + x}{TP_{point2} + x + FP_{point2} + \cfrac{FP_{point3} - FP_{point2}}{TP_{point3} - TP_{point2}} \cdot x}}

with the following values.

  • TPpoint2: 2
  • FPpoint2: 1
  • TPpoint3: 3
  • FPpoint3: 2
  • x: 0.5

The calculated precision value is 0.625.

  • Point 2.5: (0.625, 0.625)

Points 3-4: Calculating the end point

The end point of the precision-recall curve is always (P / (P + N), 1.0). For instance, the end point is (0.5, 1.0) from (4 / (4 + 4), 1.0) when P is 4, and N is 4. Subsequently, the end position and the previous position should be connected by non-linear interpolation.

The end point of a Precision-Recall curve.
The end point is precision curve can be calculated as P / (P + N). It is 0.5 when P is 4 and N is 4.

Interpretation of precision-recall curves

Similar to a ROC curve, it is easy to interpret a precision-recall curve. We use several examples to explain how to interpret precision-recall curves.

A precision-recall curve of a random classifier

A classifier with the random performance level shows a horizontal line as P / (P + N). This line separates the precision-recall space into two areas. The separated area above the line is the area of good performance levels. The other area below the line is the area of poor performance.

Two Precision-Recall curves of random classifiers for different positive and negative ratio.
A random classifier shows a straight line as P / (P + N). For instance, the line is y = 0.5 when the ratio of positives and negatives is 1:1, whereas 0.25 when the ratio is 1:3.

A precision-recall curve of a perfect classifier

A classifier with the perfect performance level shows a combination of two straight lines – from the top left corner (0.0, 1.0) to the top right corner (1.0, 1.0) and further down to the end point (1.0, P / (P + N)).

Two Precision-Recall curves of perfect classifiers for different positive and negative ratio.
A perfect classifier shows a combination of two straight lines. The end point depends on the ratio of positives and negatives. For instance, the end point is (1.0, 0.5) when the ratio of positives and negatives is 1:1, whereas (1.0, 0.25) when the ratio is 1:3.

Precision-recall curves for multiple models

It is easy to compare several classifiers in the precision-recall plot. Curves close to the perfect precision-recall curve have a better performance level than the ones closes to the baseline. In other words, a curve above the other curve has a better performance level.

Two Precision-Recall curves for two classifiers A and B - The plot indicates classifier A outperforms classifier B.
Two precision-recall curves represent the performance levels of two classifiers A and B. Classifier A clearly outperforms classifier B in this example.

Noisy curves for small recall values

A precision-recall curve can be noisy (a zigzag curve frequently going up and down) for small recall values. Therefore, precision-recall curves tend to cross each other much more frequently than ROC curves especially for small recall values. Comparisons with multiple classifiers can be difficult if the curves are too noisy.

AUC (Area Under the precision-recall Curve) score

Similar to ROC curves, the AUC (the area under the precision-recall curve) score can be used as a single performance measure for precision-recall curves. As the name indicates, it is an area under the curve calculated in the precision-recall space. An approximate but easy way to calculate the AUC score is using the trapezoidal rule, which is adding up all trapezoids under the curve.

The AUC score can be calculated by the trapezoidal rule.
The areas of the three trapezoids 1, 2, 3 are 0.335, 0.15875, and 0.1375. The AUC score is then 0.63125.

Although the theoretical range of AUC score is between 0 and 1, the actual scores of meaningful classifiers are greater than P / (P + N), which is the AUC score of a random classifier.

Four Precision-Recall curves with their AUC scores.
The score is 1.0 for the classifier with the perfect performance level (P) and 0.5 for the classifier with the random performance level (R). The plot clearly shows classifier A outperforms classifier B, which is also supported by their AUC scores (0.7 and 0.52).

One-to-one relationship between ROC and precision-recall points

Davis and Goadrich introduced the one-to-one relationship between ROC and precision-recall points in their article (Davis2006). In principle, one point in the ROC space always has a corresponding point in the precision-recall space, and vice versa. This relationship is also closely related with the non-linear interpolation of two precision-recall points

A ROC curve and a precision-recall curve should indicate the same performance level for a classifier. Nevertheless, they usually appear to be different, and even interpretation can be different.

One-to-one relationship between ROC and Precision-Recall points.
Four ROC points 1, 2, 3, and 4 correspond to precision-recall points 1, 2, 3, and 4, respectively.

In addition, the AUC scores are different between ROC and precision-recall for the same classifier.

Difference of the AUC scores between ROC and Precision-Recall.
ROC shows the same AUC score for A (0.61) and B (0.61), but precision-recall shows different scores for A (0.62) and B (0.53).

3 important characteristics of the precision-recall plot

Among several known characteristics of the precision-recall plot, three of them are important to consider for accurate the precision-recall analysis.

  1. Interpolation between two precision-recall points is non-linear.
  2. The ratio of positives and negatives defines the baseline.
  3. A ROC point and a precision-recall point always have a one-to-one relationship.

These characters are also important when the plot is applied to imbalanced datasets. For more details about the precision-recall plot with imbalanced datasets, we recommend reading the following pages.

9 thoughts on “Introduction to the precision-recall plot”

  1. Hi!

    I don’t understand if endpoint needs to be calculated or if it is just taken from the observations. In case it has to be calculated, I don’t understand how to do that (why P and N are equal 4).

    I would appreciate if you could clarify my doubts.
    Thanks in advance.

    Like

    1. Hi!

      Yes, the end point always needs to be calculated. It represents the case where all data instances are predicted as positive (you can see the section “Confusion matrix for TH=0.0” on the Basic concept of model-wide evaluation page for an example).

      Our test set here has 8 instances with 4 positives (P=4) and 4 negatives (N=4). True positive is 4 (TP=4), and the number of instances as predicted as positive is 8 when the all data instances are predicted as positive, which is the case for the end point. Therefore, TP is always equivalent with P at the end point. You can simply calculate the precision at the endpoint as TP / (# of predicted as positive) = P / (# of all data instances) = P / (P + N).

      I hope this will help.

      Like

      1. Hi, the end point is well recall is 1 which indicates all labels positive are correctly predicted. But there can be more than one threshold that meets this condition. How do we determine the precision in this case?

        Like

      2. Hi Alex,

        Precision is always calculated as TP / (# of predicted as positive), and it is common to have multiple precision values for one recall value. It is drawn as a vertical line in that case. This also applies when recall is 1.

        Like

  2. Hi Takaya,

    I believe the location of the random classifier line in the Precision-Recall plot is incorrect in figures 5b and 6b (the figures associated with sections “A Precision-Recall curve of a random classifier” and “A Precision-Recall curve of a perfect classifier”).

    The ratio of positives to negatives is stated as 3:1. By the given formula of P/(P+N), this should give 0.75 instead of 0.25 as shown. A simpler fix would be to state that the P:N ratio as 1:3 instead of 3:1.

    Cheers,
    Toby

    Like

    1. Hi Toby,

      Yes, the correct P:N ratio should be 1:3 instead of 3:1 as suggested. My bad! I have fixed both figures and uploaded them. Thanks a lot. I appreciate your help.

      Cheers,
      Takaya

      Like

  3. Hi Takaya,

    I have some queries about precision-recall curve,
    1. Can we use this curve to determine THRESHOLD value for classification as we do by using ROC curve (basically balancing recall and specificity ).
    2. Are these two thresholds necessarily different?
    3.If my problem statement is like that precision is more important than recall or overall accuracy ,should i use threshold determined by precision-recall caurve?

    Thanks in advance. 🙂

    Like

    1. Hello Arpan,

      It’s a good question. There are many methods that can be used to find the optimal threshold value, but it is usually quite difficult find one that works on all unseen data (your independent test datasets, for instance).

      For ROC:
      * The balance point where sensitivity = specificity.
      * The closest point to (0, 1).
      * Yonden’s J statistics: You can calculate max(sensitivity + specificity) instead.

      Different methods usually achieve different threshold values. The main problem here is that your positive predictions tend to have way too many false positives if you use any of these threshold values on imbalanced data sets. You can use them as long as your datasets are balanced. There are alternative methods that can take error cost weights, but the optimal error costs are usually unknown so that you don’t know how to specify the weights.

      For precision-recall:
      * Break-even point: the point where precision = recall.
      * max(F-score): F-score is a harmonic mean of precision and recall.

      You can use them to find the optimal threshold value, but I strongly suggest you test the value on your validation and test datasets. It should work fine even for imbalanced datasets.

      Regards,
      Takaya

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s