We explain the method of the simulation analysis that aims to reveal the difference between ROC and precision-recall through this page.
3 essential steps to prepare for the simulation analysis
The preparation for the analysis consists of three major steps.
- Define score distributions for positive and negative separately
- Randomly generate samples with a specific positive:negative ratio
- Calculate evaluation measures and create plots
Step 1: Define score distributions
We prepared five different performance levels – random, poor early retrieval, good early retrieval, excellent, and perfect – for our simulation analysis. We used two probability distributions for generating positive and negative scores to model classifiers with different levels.
|Random||Norm(0, 1)||Norm(0, 1)|
|Poor early retrieval||Beta(4, 1)||Beta(1, 1)|
|Good early retrieval||Beta(1, 1)||Beta(1, 4)|
|Excellent||Norm(0, 1)||Norm(3, 1)|
Performance level: Random
Two identical normal distributions Norm(0, 1) are used to make the random performance level. The scores of positives are almost identically distributed with the score of negatives.
Combined positive and negative scores can be transformed to score ranks with higher ranks indicating higher scores. For instance, assume that the range of the ranks is between 1 and 1000 for 500 positives and 500 negatives, and with this ranking, the highest score is ranked as 1000, and the lowest score is ranked as 1.
Performance level: Poor early retrieval
The term “early retrial” represents the area with high specificity in the ROC plot. In our analysis, two different performance levels are defined to analyse the performance of this area. They have the same overall performance level, but their performance levels are different when the early retrieval area is considered.
The “poor early retrieval” level is created by two beta distributions. The beta distribution takes two parameters (α, β) called shape parameters. The interval of the values sampled from a beta distribution is always [0, 1] regardless of the shape parameters.
The score ranks show more positives than negatives for top ranking scores.
Performance level: Good early retrieval
The “good early retrieval” level is also created by two beta distributions. It has the same overall performance level as the “poor early retrieval” level.
The distribution of the score ranks is similar to that of the poor early retrieval level. The good early retrieval level contains both higher and lower ranked positives than the poor early retrieval level.
Performance level: Excellent
Two normal distributions Norm(3, 1) and Norm(0, 1) are used to make the excellent performance level. The difference of the means tends to give higher scores for positives.
The score ranks show that most positives have higher ranks than negatives.
Performance level: Perfect
Two constant values 1 and 0 are used for positives and negatives to generate the perfect level instead of using particular probability distributions. The score ranks show that all positives have higher ranks than negatives.
Step 2: Random generation of scores
Once two different distributions for positive and negative are defined, it is easy to randomly generate scores with a specific positive and negative ratio. In our analysis, a balanced dataset contains 1000 positives and 1000 negatives, whereas an imbalanced dataset contains 1000 positives and 10 000 negatives.
|# of positives||# of negatives|
Step 3: Calculation of evaluation measures
The simulation is performed by iterating both random sample generation and performance measure calculation. In our analysis, the sample generation is iterated 1000 times, and performance measures are calculated for each iteration.
The final evaluation plots, such as ROC and precision-recall, show the average of all curves generated by all iterations.
The main purpose of this simulation is to clarify the difference between ROC and precision-recall under different conditions. Please see the following page for more details.