 # Method of the simulation

We explain the method of the simulation analysis that aims to reveal the difference between ROC and precision-recall through this page.

## 3 essential steps to prepare for the simulation analysis

The preparation for the analysis consists of three major steps.

1. Define score distributions for positive and negative separately
2. Randomly generate samples with a specific positive:negative ratio
3. Calculate evaluation measures and create plots This schematic diagram shows three essential steps to prepare for the simulation analysis.

### Step 1: Define score distributions

We prepared five different performance levels – random, poor early retrieval, good early retrieval, excellent, and perfect – for our simulation analysis. We used two probability distributions for generating positive and negative scores to model classifiers with different levels.

Level Positive Negative
Random Norm(0, 1) Norm(0, 1)
Poor early retrieval Beta(4, 1) Beta(1, 1)
Good early retrieval Beta(1, 1) Beta(1, 4)
Excellent Norm(0, 1) Norm(3, 1)
Perfect 1 0

#### Performance level: Random

Two identical normal distributions Norm(0, 1) are used to make the random performance level. The scores of positives are almost identically distributed with the score of negatives. Two normal distributions Norm (0, 1) are used for generating positive and negative scores for the random performance level.

Combined positive and negative scores can be transformed to score ranks with higher ranks indicating higher scores. For instance, assume that the range of the ranks is between 1 and 1000 for 500 positives and 500 negatives, and with this ranking, the highest score is ranked as 1000, and the lowest score is ranked as 1. Score ranks for the random performance level. The scores are generated for 500 positives and 500 negatives, and the ranks are between 1 and 1000.

#### Performance level: Poor early retrieval

The term “early retrial” represents the area with high specificity in the ROC plot. In our analysis, two different performance levels are defined to analyse the performance of this area. They have the same overall performance level, but their performance levels are different when the early retrieval area is considered.

The “poor early retrieval” level is created by two beta distributions. The beta distribution takes two parameters (α, β) called shape parameters. The interval of the values sampled from a beta distribution is always [0, 1] regardless of the shape parameters. Two beta distributions are used to create the “poor early retrieval” level. Shape parameters are (4, 1) for positives and (1, 1) for negatives.

The score ranks show more positives than negatives for top ranking scores. Score ranks for the poor early retrieval level. The scores are generated for 500 positives and 500 negatives, and the ranks are between 1 and 1000.

#### Performance level: Good early retrieval

The “good early retrieval” level is also created by two beta distributions. It has the same overall performance level as the “poor early retrieval” level. Two beta distributions are used to create the “good early retrieval” level. Shape parameters are (1, 1) for positives and (1, 4) for negatives.

The distribution of the score ranks is similar to that of the poor early retrieval level. The good early retrieval level contains both higher and lower ranked positives than the poor early retrieval level. Score ranks for the good early retrieval level. The scores are generated for 500 positives and 500 negatives, and the ranks are between 1 and 1000.

#### Performance level: Excellent

Two normal distributions Norm(3, 1) and Norm(0, 1) are used to make the excellent performance level. The difference of the means tends to give higher scores for positives. Two normal distributions Norm(3, 1) and Norm(0, 1) are used for generating positive and negative scores for the random performance level.

The score ranks show that most positives have higher ranks than negatives. Score ranks for the excellent performance level. The scores are generated for 500 positives and 500 negatives, and the ranks are between 1 and 1000.

#### Performance level: Perfect

Two constant values 1 and 0 are used for positives and negatives to generate the perfect level instead of using particular probability distributions. The score ranks show that all positives have higher ranks than negatives. Score ranks for the perfect performance level. The scores are generated for 500 positives and 500 negatives, and the ranks are between 1 and 1000.

### Step 2: Random generation of scores

Once two different distributions for positive and negative are defined, it is easy to randomly generate scores with a specific positive and negative ratio. In our analysis, a balanced dataset contains 1000 positives and 1000 negatives, whereas an imbalanced dataset contains 1000 positives and 10 000 negatives.

# of positives # of negatives
Balanced 1000 1000
Imbalanced 1000 10 000

### Step 3: Calculation of evaluation measures

The simulation is performed by iterating both random sample generation and performance measure calculation. In our analysis, the sample generation is iterated 1000 times, and performance measures are calculated for each iteration.

The final evaluation plots, such as ROC and precision-recall, show the average of all curves generated by all iterations. The final ROC curve (blue) is the average of the 1000 ROC curves (gray).

## Analysis results

The main purpose of this simulation is to clarify the difference between ROC and precision-recall under different conditions. Please see the following page for more details.