# Some R Packages for ROC Curves

**R Views**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In a recent post, I presented some of the theory underlying ROC curves, and outlined the history leading up to their present popularity for characterizing the performance of machine learning models. In this post, I describe how to search CRAN for packages to plot ROC curves, and highlight six useful packages.

Although I began with a few ideas about packages that I wanted to talk about, like ROCR and pROC, which I have found useful in the past, I decided to use Gábor Csárdi’s relatively new package pkgsearch to search through CRAN and see what’s out there. The `package_search()`

function takes a text string as input and uses basic text mining techniques to search all of CRAN. The algorithm searches through package text fields, and produces a score for each package it finds that is weighted by the number of reverse dependencies and downloads.

library(tidyverse) # for data manipulation library(dlstats) # for package download stats library(pkgsearch) # for searching packages

After some trial and error, I settled on the following query, which includes a number of interesting ROC-related packages.

rocPkg <- pkg_search(query="ROC",size=200)

Then, I narrowed down the field to 46 packages by filtering out orphaned packages and packages with a score less than 190.

rocPkgShort <- rocPkg %>% filter(maintainer_name != "ORPHANED", score > 190) %>% select(score, package, downloads_last_month) %>% arrange(desc(downloads_last_month)) head(rocPkgShort)

## # A tibble: 6 x 3 ## score package downloads_last_month ## <dbl> <chr> <int> ## 1 690. ROCR 56356 ## 2 7938. pROC 39584 ## 3 1328. PRROC 9058 ## 4 833. sROC 4236 ## 5 266. hmeasure 1946 ## 6 1021. plotROC 1672

To complete the selection process, I did the hard work of browsing the documentation for the packages to pick out what I thought would be generally useful to most data scientists. The following plot uses Guangchuang Yu’s `dlstats`

package to look at the download history for the six packages I selected to profile.

library(dlstats) shortList <- c("pROC","precrec","ROCit", "PRROC","ROCR","plotROC") downloads <- cran_stats(shortList) ggplot(downloads, aes(end, downloads, group=package, color=package)) + geom_line() + geom_point(aes(shape=package)) + scale_y_continuous(trans = 'log2')

### ROCR – 2005

ROCR has been around for almost 14 years, and has be a rock-solid workhorse for drawing ROC curves. I particularly like the way the `performance()`

function has you set up calculation of the curve by entering the true positive rate, `tpr`

, and false positive rate, `fpr`

, parameters. Not only is this reassuringly transparent, it shows the flexibility to calculate nearly every performance measure for a binary classifier by entering the appropriate parameter. For example, to produce a precision-recall curve, you would enter `prec`

and `rec`

. Although there is no vignette, the documentation of the package is very good.

The following code sets up and plots the default `ROCR`

ROC curve using a synthetic data set that comes with the package. I will use this same data set throughout this post.

library(ROCR)

## Loading required package: gplots

## ## Attaching package: 'gplots'

## The following object is masked from 'package:stats': ## ## lowess

# plot a ROC curve for a single prediction run # and color the curve according to cutoff. data(ROCR.simple) df <- data.frame(ROCR.simple) pred <- prediction(df$predictions, df$labels) perf <- performance(pred,"tpr","fpr") plot(perf,colorize=TRUE)

### pROC – 2010

It is clear from the downloads curve that `pROC`

is also popular with data scientists. I like that it is pretty easy to get confidence intervals for the Area Under the Curve, `AUC`

, on the plot.

library(pROC)

## Type 'citation("pROC")' for a citation.

## ## Attaching package: 'pROC'

## The following objects are masked from 'package:stats': ## ## cov, smooth, var

pROC_obj <- roc(df$labels,df$predictions, smoothed = TRUE, # arguments for ci ci=TRUE, ci.alpha=0.9, stratified=FALSE, # arguments for plot plot=TRUE, auc.polygon=TRUE, max.auc.polygon=TRUE, grid=TRUE, print.auc=TRUE, show.thres=TRUE) sens.ci <- ci.se(pROC_obj) plot(sens.ci, type="shape", col="lightblue")

## Warning in plot.ci.se(sens.ci, type = "shape", col = "lightblue"): Low ## definition shape.

plot(sens.ci, type="bars")

### PRROC – 2014

Although not nearly as popular as `ROCR`

and `pROC`

, `PRROC`

seems to be making a bit of a comeback lately. The terminology for the inputs is a bit eclectic, but once you figure that out the `roc.curve()`

function plots a clean ROC curve with minimal fuss. `PRROC`

is really set up to do precision-recall curves as the vignette indicates.

library(PRROC) PRROC_obj <- roc.curve(scores.class0 = df$predictions, weights.class0=df$labels, curve=TRUE) plot(PRROC_obj)

### plotROC – 2014

`plotROC`

is an excellent choice for drawing ROC curves with `ggplot()`

. My guess is that it appears to enjoy only limited popularity because the documentation uses medical terminology like “disease status” and “markers”. Nevertheless, the documentation, which includes both a vignette and a Shiny application, is very good.

The package offers a number of feature-rich `ggplot()`

geoms that enable the production of elaborate plots. The following plot contains some styling, and includes Clopper and Pearson (1934) exact method confidence intervals.

library(plotROC) rocplot <- ggplot(df, aes(m = predictions, d = labels))+ geom_roc(n.cuts=20,labels=FALSE) rocplot + style_roc(theme = theme_grey) + geom_rocci(fill="pink")

### precrec – 2015

`precrec`

is another library for plotting ROC and precision-recall curves.

library(precrec)

## ## Attaching package: 'precrec'

## The following object is masked from 'package:pROC': ## ## auc

precrec_obj <- evalmod(scores = df$predictions, labels = df$labels) autoplot(precrec_obj)

Parameter options for the `evalmod()`

function make it easy to produce basic plots of various model features.

precrec_obj2 <- evalmod(scores = df$predictions, labels = df$labels, mode="basic") autoplot(precrec_obj2)

### ROCit – 2019

`ROCit`

is a new package for plotting ROC curves and other binary classification visualizations that rocketed onto the scene in January, and is climbing quickly in popularity. I would never have discovered it if I had automatically filtered my original search by downloads. The default plot includes the location of the Yourden’s J Statistic.

library(ROCit)

## Warning: package 'ROCit' was built under R version 3.5.2

ROCit_obj <- rocit(score=df$predictions,class=df$labels) plot(ROCit_obj)

Several other visualizations are possible. The following plot shows the cumulative densities of the positive and negative responses. The KS statistic shows the maximum distance between the two curves.

ksplot(ROCit_obj)

In this attempt to dig into CRAN and uncover some of the resources R contains for plotting ROC curves and other binary classifier visualizations, I have only scratched the surface. Moreover, I have deliberately ignored the many packages available for specialized applications, such as survivalROC for computing time-dependent ROC curves from censored survival data, and cvAUC, which contains functions for evaluating cross-validated AUC measures. Nevertheless, I hope that this little exercise will help you find what you are looking for.

**leave a comment**for the author, please follow the link and comment on their blog:

**R Views**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.