We are happy to report that, two years after publication, our PLoS One article (link) has been cited 58 times so far, according to Google Scholar (link). Citing articles are from a variety of topics, including non-bioinformatics ones.
Some bioinformatic articles citing us:
The authors say: “For data sets with unbalanced classes, ROC curves should be interpreted cautiously because a small change in the false-positive rate can lead to a large change in the absolute number of erroneous predictions [Saito and Rehmsmeier 2015; MR]. A precision-recall curve can be more useful for resolving diagnostic thresholds in next-generation sequencing tests, in which the true-negative class is usually far larger than the true-positive class” (my annotations indicated by MR)
The authors say: “Importantly, precision (and AUPR) is the most reliable performance measure when there is a highly skewed distribution of class (Ackermann et al., 2012 ; Saito and Rehmsmeier, 2015).”
Some non-bioinformatic articles:
The authors say: “Because BBD studies often involve imbalanced binary outcomes (modeling differences between a small minority and a majority), the choice of metrics should be suitable for such situations. Saito and Rehmsmeier60 show that ROC curves are insensitive to the imbalance ratio, while precision-recall charts do changes with the imbalance ratio.”
This article uses the “work-up to detection ratio (W:D)” which is the inverse of the positive predictive value (PPV).
The authors say: “However, using these measures [the common ones, not PPV; MR] presents a number of problems when one tries to predict extremely rare outcomes [Saito and Rehmsmeier 2015; MR]. The most important of these is that a classifier that tries to maximize the accuracy of its classification rule when predicting a rare outcome may obtain an accuracy of 99% just by classifying all observations as non-events, and model improvements (e.g. as quantified by increases in the c statistic) are overshadowed by the large true negative rate.” (my annotations indicated by MR)
The authors say: “When positive outcomes are sparse, it becomes easier to predict negative outcomes and, consequently, true negatives are much more numerous than false positives (i.e. the false positive rate is generally weak). Therefore, if we care more about the positive outcomes, the AUROC curve can overstate the overall performance of the classifier. The area under the precision-recall (AUPR) curve can be considered as a more appropriate evaluation metric in such situations (Saito and Rehmsmeier (2015)).”