Introduction to performance evaluation measures

Most performance measures for evaluating binary classifiers can be categorized in two main groups – 1) basic measures from the confusion matrix and 2) model-wide measures, such as ROC and precision-recall. One simple way to explain the difference is that the former requires one threshold value, whereas the latter usually considers multiple threshold values.

Dataset representation with ovals

Visualization often helps understand complex ideas and concepts. We have extensively used our dataset representation method that uses ovals to explain various performance measures. This oval representation is similar to the Venn diagram from set theory.

Dataset representation with an oval and a Venn diagram.
A dataset representation with an oval. The concept of this representation is similar to that of the Venn diagram used in set theory.


The introduction consists of four pages. The first page focuses on various basic performance measures derived from the confusion matrix, such as error-rate, accuracy, sensitivity, specificity, and precision. We introduce model-wide evaluation approaches in the remaining pages.

Short descriptions for each page

We have selected several important aspects that represent the content of each page.

Basic evaluation measures from the confusion matrix

  1. True positives, true negative, false positives, and false negative are four different types of binary classification outcomes.
  2. A confusion matrix of binary classification is a two by two table.
  3. Various evaluation measures, such as error-rate, accuracy, sensitivity, specificity, and precision, can be derived from a confusion matrix.

Basic concept of model-wide evaluation

  1. Many classification models can produce scores as well as predicted classes.
  2. Threshold values are used to predict actual classes.
  3. Multiple confusion matrices can be created for different threshold values.

Introduction to the ROC plot

  1. A ROC plot shows trade-offs between specificity and sensitivity.
  2. A ROC curve provides a single measure called AUC (area under the curve) score.
  3. The early retrieval area of a ROC plot is useful for evaluating high ranking instance.

Introduction to the precision-recall plot

  1. Precision-recall shows recall values with corresponding precision values.
  2. A precision-recall curve also provides an AUC score.
  3. Interpolation between two precision-recall points is non-linear.
  4. The ratio of positives and negatives defines the baseline.
  5. A ROC point and a precision-recall point always have a one-to-one relationship.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s