A small introduction to the ROCR package
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I've been doing some classification with logistic regression in brain imaging recently. I have been using the ROCR package, which is helpful at estimating performance measures and plotting these measures over a range of cutoffs.
The prediction
and performance
functions are the workhorses of most of the analyses in ROCR I've been doing. For those who haven't used ROCR
before, the format of the prediction
function is:
prediction(predictions, labels, label.ordering = NULL)
where predictions
are some predicted measure (usually continuous) for the “truth”, which are the labels
. In many applications, predictions
are estimated probabilities (or log odds) and the labels
are binary values. Both arguments can take a vector, matrix, or data.frame for prediction, but dim(predictions)
must equal dim(labels)
.
In this post, I'll go through creating prediction
and performance
objects and extracting the results.
Prediction objects
Simple example: one set of prediction and labels
Let's show a simple example from the prediction
help file, that uses a prediction and label vector (i.e. not a matrix). We see the data is some continuous prediction and binary label:
library(ROCR) data(ROCR.simple) head(cbind(ROCR.simple$predictions, ROCR.simple$labels), 5) [,1] [,2] [1,] 0.6125478 1 [2,] 0.3642710 1 [3,] 0.4321361 0 [4,] 0.1402911 0 [5,] 0.3848959 0
Now, let's makde the prediction object and show its contents:
pred <- prediction(ROCR.simple$predictions,ROCR.simple$labels) class(pred) [1] "prediction" attr(,"package") [1] "ROCR" slotNames(pred) [1] "predictions" "labels" "cutoffs" "fp" "tp" [6] "tn" "fn" "n.pos" "n.neg" "n.pos.pred" [11] "n.neg.pred"
We see the the returned result of prediction
is an object of class prediction
, which an S4 object with a series of slots. Let's look at the length of each slot and the class:
sn = slotNames(pred) sapply(sn, function(x) length(slot(pred, x))) predictions labels cutoffs fp tp tn 1 1 1 1 1 1 fn n.pos n.neg n.pos.pred n.neg.pred 1 1 1 1 1 sapply(sn, function(x) class(slot(pred, x))) predictions labels cutoffs fp tp tn "list" "list" "list" "list" "list" "list" fn n.pos n.neg n.pos.pred n.neg.pred "list" "list" "list" "list" "list"
We see that each slot has length 1 and is a list.
Example: multiple sets of prediction and labels
Let's use the ROCR.hiv
dataset to show how this works if more than one set of predictions and labels are supplied. Here we pass a list of (10) predictions and a list of labels to the prediction
function:
data(ROCR.hiv) manypred = prediction(ROCR.hiv$hiv.nn$predictions, ROCR.hiv$hiv.nn$labels) sapply(sn, function(x) length(slot(manypred, x))) predictions labels cutoffs fp tp tn 10 10 10 10 10 10 fn n.pos n.neg n.pos.pred n.neg.pred 10 10 10 10 10 sapply(sn, function(x) class(slot(manypred, x))) predictions labels cutoffs fp tp tn "list" "list" "list" "list" "list" "list" fn n.pos n.neg n.pos.pred n.neg.pred "list" "list" "list" "list" "list"
We see that all the slots are still lists, but now they have length (10), corresponding to the (10) predictions/labels. We would get the same result if the 2 arguments were matrices, but that would require all predictions and labels to have the same length. Using a list of predictions/labels is a bit more flexible.
Performance objects
From the help file of performance
, the syntax for this function is:
performance(prediction.obj, measure, x.measure="cutoff", ...)
We see that the first argument is a prediction
object, and the second is a measure
. If you run ?performance
, you can see all the performance measures implemented.
We will do example of some commonly estimated measures: receiver operating characteristic (ROC) curves, accuracy, area under the curve (AUC), and partial AUC (pAUC).
ROC Curve
Simple example: one set of prediction and labels
We will do an ROC curve, which plots the false positive rate (FPR) on the x-axis and the true positive rate (TPR) on the y-axis:
roc.perf = performance(pred, measure = "tpr", x.measure = "fpr") plot(roc.perf) abline(a=0, b= 1)
At every cutoff, the TPR and FPR are calculated and plotted. The smoother the graph, the more cutoffs the predictions have. We also plotted a 45-degree line, which represents, on average, the performance of a Uniform(0, 1) random variable. The further away from the diagonal line, the better. Overall, we see that we see gains in sensitivity (true positive rate, (> 80%)), trading off a false positive rate (1- specificity), up until about 15% FPR. After an FPR of 15%, we don't see significant gains in TPR for a tradeoff of increased FPR.
Example: multiple sets of prediction and labels
The same can be done if you have many predictions and labels:
many.roc.perf = performance(manypred, measure = "tpr", x.measure = "fpr") plot(many.roc.perf, col=1:10) abline(a=0, b= 1)
Essentially, the plot
function on a performance
object with multiple predictions and labels will loop over the lists and plot the ROC for each one.
Overall, we see the performance of each prediction is similar. The pROC package, described in the conclusion, can test the performance between ROC curves.
Advanced: If you want to see how performance objects are plotted, use getMethod("plot", signature = c(x="performance",y="missing"))
and ROCR:::.plot.performance
.
Limiting to a FPR: partial ROC curve
You may only want to accept a false positive rate of a certain level, let's say 10%. The function pROC
below will only keep values less than or equal to the FPR you set.
pROC = function(pred, fpr.stop){ perf <- performance(pred,"tpr","fpr") for (iperf in seq_along([email protected])){ ind = which([email protected][[iperf]] <= fpr.stop) [email protected][[iperf]] = [email protected][[iperf]][ind] [email protected][[iperf]] = [email protected][[iperf]][ind] } return(perf) }
Let's use this on the simple cases and plot the partial ROC curve:
proc.perf = pROC(pred, fpr.stop=0.1) plot(proc.perf) abline(a=0, b= 1)
Thus, if we can only accept a FPR of 10%, the model is only giving 50% sensitivity (TPR) at 10% FPR (1-specificity).
Getting an “optimal” cut point
In some applications of ROC curves, you want the point closest to the TPR of (1) and FPR of (0). This cut point is “optimal” in the sense it weighs both sensitivity and specificity equally. To deterimine this cutoff, you can use the code below. The code takes in BOTH the performance
object and prediction
object and gives the optimal cutoff value of your predictions:
opt.cut = function(perf, pred){ cut.ind = mapply(FUN=function(x, y, p){ d = (x - 0)^2 + (y-1)^2 ind = which(d == min(d)) c(sensitivity = y[[ind]], specificity = 1-x[[ind]], cutoff = p[[ind]]) }, [email protected], [email protected], pred@cutoffs) } print(opt.cut(roc.perf, pred)) [,1] sensitivity 0.8494624 specificity 0.8504673 cutoff 0.5014893
Now, there is a cost
measure in the ROCR package that you can use to create a performance
object. If you use it to find the minimum cost, then it will give you the same cutoff as opt.cut
, but not give you the sensitivity and specificity.
cost.perf = performance(pred, "cost") pred@cutoffs[[1]][which.min([email protected][[1]])] [1] 0.5014893
Different costs for FP and FN
The output from opt.cut
and a performance
object with measure cost
are NOT equivalent if false positives and false negatives are not weighted equally. The cost.fn
and cost.fp
arguments can be passed to performance
, corresponding to the cost of a false negative and false positive, respectively. Let's say false positives are twice as costly as false negatives, and let's get a cut point:
cost.perf = performance(pred, "cost", cost.fp = 2, cost.fn = 1) pred@cutoffs[[1]][which.min([email protected][[1]])] [1] 0.5294022
Thus, we have a different “optimal” cut point with this changed cost function. In many real-life applications of biomarkers, the cost of a false positive and false negative are not the same. For example, missing someone with a disease based on a test may cost a hospital $1,000,000 in lawsuits, but treating someone who did not have the disease may cost $100,000 in treatments. In that case, the cost of a false negative is 10 times that of a false positive, strictly in monetary measures. No cost analysis is this simple and is usually based on many factors, but most analyses do not have equal cost for a false positive versus a false negative.
The code is the same for the optimal cutoff for the multiple prediction data:
print(opt.cut(many.roc.perf, manypred)) [,1] [,2] [,3] [,4] [,5] sensitivity 0.8076923 0.8205128 0.7692308 0.8205128 0.7564103 specificity 0.7902622 0.7827715 0.8501873 0.8164794 0.8464419 cutoff -0.5749773 -0.5640632 -0.4311301 -0.5336958 -0.4863360 [,6] [,7] [,8] [,9] [,10] sensitivity 0.7820513 0.7948718 0.7820513 0.7435897 0.7435897 specificity 0.8089888 0.8314607 0.8089888 0.8352060 0.8501873 cutoff -0.5364402 -0.4816705 -0.5388664 -0.4777073 -0.4714354
Accuracy
Simple example: one set of prediction and labels
Another cost measure that is popular is overall accuracy. This measure optimizes the correct results, but may be skewed if there are many more negatives than positives, or vice versa. Let's get the overall accuracy for the simple predictions and plot it:
acc.perf = performance(pred, measure = "acc") plot(acc.perf)
What if we actually want to extract the maximum accuracy and the cutoff corresponding to that? In the performance
object, we have the slot x.values
, which corresponds to the cutoff
in this case, and y.values
, which corresponds to the accuracy of each cutoff. We'll grab the index for maximum accuracy and then grab the corresponding cutoff:
ind = which.max( slot(acc.perf, "y.values")[[1]] ) acc = slot(acc.perf, "y.values")[[1]][ind] cutoff = slot(acc.perf, "x.values")[[1]][ind] print(c(accuracy= acc, cutoff = cutoff)) accuracy cutoff 0.8500000 0.5014893
Hooray! Then you can go forth and threshold your model using the cutoff
for (in hopes) maximum accuracy in your test data.
Example: multiple sets of prediction and labels
Again, we will do the same with many predictions and labels, but must iterate over the results (using a mapply
statement):
many.acc.perf = performance(manypred, measure = "acc") sapply(manypred@labels, function(x) mean(x == 1)) [1] 0.226087 0.226087 0.226087 0.226087 0.226087 0.226087 0.226087 [8] 0.226087 0.226087 0.226087 mapply(function(x, y){ ind = which.max( y ) acc = y[ind] cutoff = x[ind] return(c(accuracy= acc, cutoff = cutoff)) }, slot(many.acc.perf, "x.values"), slot(many.acc.perf, "y.values")) [,1] [,2] [,3] [,4] [,5] [,6] accuracy 0.86376812 0.881159420 0.8666667 0.8724638 0.8724638 0.8753623 cutoff 0.02461465 -0.006091327 0.2303707 -0.1758013 0.1251976 -0.2153779 [,7] [,8] [,9] [,10] accuracy 0.8753623 0.8724638 0.8637681 0.86376812 cutoff -0.2066697 0.1506282 0.2880392 0.06536471
We see that these cutoffs are not the same as those using the opt.cut
from above. This is due to the the fact that the proportion of positive cases is much less than 50%.
Area under the curve (AUC) and partial AUC (pAUC)
Simple example: one set of prediction and labels
The area under curve summarizes the ROC curve just by taking the area between the curve and the x-axis. the Let's get the area under the curve for the simple predictions:
auc.perf = performance(pred, measure = "auc") [email protected] [[1]] [1] 0.8341875
As you can see, the result is a scalar number, the area under the curve (AUC). This number ranges from (0) to (1) with (1) indicating 100% specificity and 100% sensitivity.
As before, if you only want to accept a fixed FPR, we can calculate a partial AUC, using the fpr.stop
argument:
pauc.perf = performance(pred, measure = "auc", fpr.stop=0.1) [email protected] [[1]] [1] 0.02780625
Now, we see the pAUC to be much lower. It is of note that this value can range from (0) to whatever fpr.stop
is. In order to standardize it to (1), you can divide it by fpr.stop
to give a ([0, 1]) measure:
[email protected] = lapply([email protected], function(x) x / 0.1) [email protected] [[1]] [1] 0.2780625
Although this measure is more comparable to the full AUC measure, it is still low. Note, there is no “one” cutoff for AUC or pAUC, as it measures the performance over all cutoffs. Also, plotting functions for scalar outcome measures (such as AUC) do not work for performance
objects. The code for the multiple predictions is the same.
manypauc.perf = performance(manypred, measure = "auc", fpr.stop=0.1) [email protected] = lapply([email protected], function(x) x / 0.1) [email protected] [[1]] [1] 0.500048 [[2]] [1] 0.5692404 [[3]] [1] 0.5182944 [[4]] [1] 0.5622299 [[5]] [1] 0.5379814 [[6]] [1] 0.5408624 [[7]] [1] 0.5509939 [[8]] [1] 0.5334678 [[9]] [1] 0.4979353 [[10]] [1] 0.4870354
Note, use sapply
instead of lapply
if you want the result to be a vector.
Conclusion
For ROC analysis the ROCR package has good methods and many built in measures. Other packages, such as the pROC package, can be useful for many functions and analyses, especially testing the difference between ROC and pROC curves. In some ways, you may want to use proc
admissibly over ROCR, especially because (when I checked Dec 18, 2014) the ROCR package was orphaned. But if you are working in ROCR, I hope this give you some examples of how to fit the objects and extract the results.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.