Most performance measures for evaluating binary classifiers can be categorized in two main groups – 1) basic measures from the confusion matrix and 2) model-wide measures, such as ROC and precision-recall. One simple way to explain the difference is that the former requires one threshold value, whereas the latter usually considers multiple threshold values.
Dataset representation with ovals
Visualization often helps understand complex ideas and concepts. We have extensively used our dataset representation method that uses ovals to explain various performance measures. This oval representation is similar to the Venn diagram from set theory.

Contents
The introduction consists of four pages. The first page focuses on various basic performance measures derived from the confusion matrix, such as error-rate, accuracy, sensitivity, specificity, and precision. We introduce model-wide evaluation approaches in the remaining pages.
- Basic evaluation measures from the confusion matrix
- Basic concept of model-wide evaluation
- Introduction to the ROC (Receiver Operating Characteristics) plot
- Introduction to the precision-recall plot
Short descriptions for each page
We have selected several important aspects that represent the content of each page.
Basic evaluation measures from the confusion matrix
- True positives, true negative, false positives, and false negative are four different types of binary classification outcomes.
- A confusion matrix of binary classification is a two by two table.
- Various evaluation measures, such as error-rate, accuracy, sensitivity, specificity, and precision, can be derived from a confusion matrix.
Basic concept of model-wide evaluation
- Many classification models can produce scores as well as predicted classes.
- Threshold values are used to predict actual classes.
- Multiple confusion matrices can be created for different threshold values.
Introduction to the ROC plot
- A ROC plot shows trade-offs between specificity and sensitivity.
- A ROC curve provides a single measure called AUC (area under the curve) score.
- The early retrieval area of a ROC plot is useful for evaluating high ranking instance.
Introduction to the precision-recall plot
- Precision-recall shows recall values with corresponding precision values.
- A precision-recall curve also provides an AUC score.
- Interpolation between two precision-recall points is non-linear.
- The ratio of positives and negatives defines the baseline.
- A ROC point and a precision-recall point always have a one-to-one relationship.