corrr 0.2.1 now on CRAN

October 11, 2016
By

(This article was first published on blogR, and kindly contributed to R-bloggers)

@drsimonj here to discuss the latest CRAN release of corrr (0.2.1), a package for exploring correlations in a tidy R framework. This post will describe corrr features added since version 0.1.0.

You can install or update to this latest version directly from CRAN by running:

install.packages(corrr)

Let’s load corrr into our workspace and create a correlation data frame of the mtcars data set to work with:

library(corrr)
rdf <- correlate(mtcars)
rdf
#> # A tibble: 11 × 12
#>    rowname        mpg        cyl       disp         hp        drat
#>                                     
#> 1      mpg         NA -0.8521620 -0.8475514 -0.7761684  0.68117191
#> 2      cyl -0.8521620         NA  0.9020329  0.8324475 -0.69993811
#> 3     disp -0.8475514  0.9020329         NA  0.7909486 -0.71021393
#> 4       hp -0.7761684  0.8324475  0.7909486         NA -0.44875912
#> 5     drat  0.6811719 -0.6999381 -0.7102139 -0.4487591          NA
#> 6       wt -0.8676594  0.7824958  0.8879799  0.6587479 -0.71244065
#> 7     qsec  0.4186840 -0.5912421 -0.4336979 -0.7082234  0.09120476
#> 8       vs  0.6640389 -0.8108118 -0.7104159 -0.7230967  0.44027846
#> 9       am  0.5998324 -0.5226070 -0.5912270 -0.2432043  0.71271113
#> 10    gear  0.4802848 -0.4926866 -0.5555692 -0.1257043  0.69961013
#> 11    carb -0.5509251  0.5269883  0.3949769  0.7498125 -0.09078980
#> # ... with 6 more variables: wt , qsec , vs , am ,
#> #   gear , carb 

 Plotting functions

The significant changes involve the rplot() and new network_plot() functions that support the visualisation of your correlations.

 rplot()

rplot() produces a shape plot of the correlations. More visible dots correspond to stronger correlations, and blue and red respectively to positive and negative. The default plot looks like this:

rplot(rdf)

rplot1-1.png

There are now four arguments that allow you to make adjustments to this plot:

Here are some examples that change these values:

rplot(rdf, legend = TRUE, shape = 1)

rplot-shape-1.png

rplot(rdf, legend = TRUE, colours = c("firebrick1", "black", "darkcyan"))

rplot-color-1.png

rplot(rdf, print_cor = TRUE)

rplot-print-1.png

And don’t forget that you can rearrange() your correlations first:

rdf %>% rearrange(absolute = FALSE) %>% rplot(shape = 0, print_cor = TRUE)

rplot-rear-1.png

 network_plot()

network_plot() produces a network that lays out and connects variables based on the strength of their correlations:

network_plot(rdf)

netplot1-1.png

For a good intro to network_plot(), see my previous blogR post. Three arguments allow you to adjust this plot:

  • min_cor Number from 0 to 1 indicating the minimum value of correlations (in absolute terms) to plot.
  • legend same as rplot()
  • colours or colors same as rplot()

Some examples:

network_plot(rdf, legend = TRUE, colours = c("slategrey", "palegreen"))

netplot-egs-1.png


network_plot(rdf, legend = TRUE, min_cor = .7)

netplot-egs-2.png

 Other features

 fashion()

fashion() will now try to work on almost any object (not just correlation data frames). It also provides arguments to adjust the number of decimals, whether to display leading_zeros, and how to print missing values (na_print):

fashion(rdf)
#>    rowname  mpg  cyl disp   hp drat   wt qsec   vs   am gear carb
#> 1      mpg      -.85 -.85 -.78  .68 -.87  .42  .66  .60  .48 -.55
#> 2      cyl -.85       .90  .83 -.70  .78 -.59 -.81 -.52 -.49  .53
#> 3     disp -.85  .90       .79 -.71  .89 -.43 -.71 -.59 -.56  .39
#> 4       hp -.78  .83  .79      -.45  .66 -.71 -.72 -.24 -.13  .75
#> 5     drat  .68 -.70 -.71 -.45      -.71  .09  .44  .71  .70 -.09
#> 6       wt -.87  .78  .89  .66 -.71      -.17 -.55 -.69 -.58  .43
#> 7     qsec  .42 -.59 -.43 -.71  .09 -.17       .74 -.23 -.21 -.66
#> 8       vs  .66 -.81 -.71 -.72  .44 -.55  .74       .17  .21 -.57
#> 9       am  .60 -.52 -.59 -.24  .71 -.69 -.23  .17       .79  .06
#> 10    gear  .48 -.49 -.56 -.13  .70 -.58 -.21  .21  .79       .27
#> 11    carb -.55  .53  .39  .75 -.09  .43 -.66 -.57  .06  .27

fashion(mtcars) %>% head()
#>     mpg  cyl   disp     hp drat   wt  qsec   vs   am gear carb
#> 1 21.00 6.00 160.00 110.00 3.90 2.62 16.46  .00 1.00 4.00 4.00
#> 2 21.00 6.00 160.00 110.00 3.90 2.88 17.02  .00 1.00 4.00 4.00
#> 3 22.80 4.00 108.00  93.00 3.85 2.32 18.61 1.00 1.00 4.00 1.00
#> 4 21.40 6.00 258.00 110.00 3.08 3.21 19.44 1.00  .00 3.00 1.00
#> 5 18.70 8.00 360.00 175.00 3.15 3.44 17.02  .00  .00 3.00 2.00
#> 6 18.10 6.00 225.00 105.00 2.76 3.46 20.22 1.00  .00 3.00 1.00

fashion(c(0.340823, NA, -10.000032), decimals = 3, na_print = "MISSING")
#> [1]    .341 MISSING -10.000

fashion(c(0.340823, NA, -10.000032), leading_zeros = TRUE)
#> [1]   0.34        -10.00

 focus()

A standard evaluation version of focus() is now available, focus_(), to programatically focus on specific correlations:

vars <- c("mpg", "disp")
focus_(rdf, "hp", .dots = vars)
#> # A tibble: 8 × 4
#>   rowname         hp        mpg       disp
#>                       
#> 1     cyl  0.8324475 -0.8521620  0.9020329
#> 2    drat -0.4487591  0.6811719 -0.7102139
#> 3      wt  0.6587479 -0.8676594  0.8879799
#> 4    qsec -0.7082234  0.4186840 -0.4336979
#> 5      vs -0.7230967  0.6640389 -0.7104159
#> 6      am -0.2432043  0.5998324 -0.5912270
#> 7    gear -0.1257043  0.4802848 -0.5555692
#> 8    carb  0.7498125 -0.5509251  0.3949769

 Bugs and stuff

Other than these, there have been fixes to various bugs and minor improvements made to existing functions. Please don’t forget to open an issue on GitHub or email me if you spot an issue or would like a new feature when using corrr.

 Acknowledgements

Many thanks to the community who have already been using corrr and made suggestions along the way. Your help is invaluable for improving corrr!

 Sign off

Thanks for reading and I hope this was useful for you.

For updates of recent blog posts, follow @drsimonj on Twitter, or email me at [email protected] to get in touch.

If you’d like the code that produced this blog, check out the blogR GitHub repository.

To leave a comment for the author, please follow the link and comment on their blog: blogR.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)