fashion() output with corrr

August 3, 2016
By

(This article was first published on blogR, and kindly contributed to R-bloggers)

Tired of trying to get your data to print right or formatting it in a program like excel? Try out fashion() from the corrr package:

d <- data.frame(
  gender = factor(c("Male", "Female", NA)),
  age    = c(NA, 28.1111111, 74.3),
  height = c(188, NA, 168.78906),
  fte    = c(NA, .78273, .9)
)
d
#>   gender      age   height     fte
#> 1   Male       NA 188.0000      NA
#> 2 Female 28.11111       NA 0.78273
#> 3    74.30000 168.7891 0.90000

library(corrr)
fashion(d)
#>   gender   age height  fte
#> 1   Male       188.00     
#> 2 Female 28.11         .78
#> 3        74.30 168.79  .90

But how does it work and what does it do?

 The inspiration: correlations and decimals

The insipration for fashion() came from my unending frustration at getting a correlation matrix to print out exactly how I wanted. For example, printing correlations typically looks something like:

mtcars %>% correlate()
#> # A tibble: 11 x 12
#>    rowname        mpg        cyl       disp         hp        drat
#>                                     
#> 1      mpg         NA -0.8521620 -0.8475514 -0.7761684  0.68117191
#> 2      cyl -0.8521620         NA  0.9020329  0.8324475 -0.69993811
#> 3     disp -0.8475514  0.9020329         NA  0.7909486 -0.71021393
#> 4       hp -0.7761684  0.8324475  0.7909486         NA -0.44875912
#> 5     drat  0.6811719 -0.6999381 -0.7102139 -0.4487591          NA
#> 6       wt -0.8676594  0.7824958  0.8879799  0.6587479 -0.71244065
#> 7     qsec  0.4186840 -0.5912421 -0.4336979 -0.7082234  0.09120476
#> 8       vs  0.6640389 -0.8108118 -0.7104159 -0.7230967  0.44027846
#> 9       am  0.5998324 -0.5226070 -0.5912270 -0.2432043  0.71271113
#> 10    gear  0.4802848 -0.4926866 -0.5555692 -0.1257043  0.69961013
#> 11    carb -0.5509251  0.5269883  0.3949769  0.7498125 -0.09078980
#> # ... with 6 more variables: wt , qsec , vs , am ,
#> #   gear , carb 

But this is just plain ugly. Personally, I wanted:

  • Decimal places rounded to the same length (usually 2)
  • All the leading zeros removed, but keeping the decimal aligned with/without - for negative numbers.
  • Missing values (NA) to appear empty ("").

This is exactly what fashion does:

mtcars %>% correlate() %>% fashion()
#>    rowname  mpg  cyl disp   hp drat   wt qsec   vs   am gear carb
#> 1      mpg      -.85 -.85 -.78  .68 -.87  .42  .66  .60  .48 -.55
#> 2      cyl -.85       .90  .83 -.70  .78 -.59 -.81 -.52 -.49  .53
#> 3     disp -.85  .90       .79 -.71  .89 -.43 -.71 -.59 -.56  .39
#> 4       hp -.78  .83  .79      -.45  .66 -.71 -.72 -.24 -.13  .75
#> 5     drat  .68 -.70 -.71 -.45      -.71  .09  .44  .71  .70 -.09
#> 6       wt -.87  .78  .89  .66 -.71      -.17 -.55 -.69 -.58  .43
#> 7     qsec  .42 -.59 -.43 -.71  .09 -.17       .74 -.23 -.21 -.66
#> 8       vs  .66 -.81 -.71 -.72  .44 -.55  .74       .17  .21 -.57
#> 9       am  .60 -.52 -.59 -.24  .71 -.69 -.23  .17       .79  .06
#> 10    gear  .48 -.49 -.56 -.13  .70 -.58 -.21  .21  .79       .27
#> 11    carb -.55  .53  .39  .75 -.09  .43 -.66 -.57  .06  .27

And if I want to change the number of decimal places and have a different place holder for NA values (na_print):

mtcars %>% correlate() %>% fashion(decimals = 1, na_print = "x")
#>    rowname mpg cyl disp  hp drat  wt qsec  vs  am gear carb
#> 1      mpg   x -.9  -.8 -.8   .7 -.9   .4  .7  .6   .5  -.6
#> 2      cyl -.9   x   .9  .8  -.7  .8  -.6 -.8 -.5  -.5   .5
#> 3     disp -.8  .9    x  .8  -.7  .9  -.4 -.7 -.6  -.6   .4
#> 4       hp -.8  .8   .8   x  -.4  .7  -.7 -.7 -.2  -.1   .7
#> 5     drat  .7 -.7  -.7 -.4    x -.7   .1  .4  .7   .7  -.1
#> 6       wt -.9  .8   .9  .7  -.7   x  -.2 -.6 -.7  -.6   .4
#> 7     qsec  .4 -.6  -.4 -.7   .1 -.2    x  .7 -.2  -.2  -.7
#> 8       vs  .7 -.8  -.7 -.7   .4 -.6   .7   x  .2   .2  -.6
#> 9       am  .6 -.5  -.6 -.2   .7 -.7  -.2  .2   x   .8   .1
#> 10    gear  .5 -.5  -.6 -.1   .7 -.6  -.2  .2  .8    x   .3
#> 11    carb -.6  .5   .4  .7  -.1  .4  -.7 -.6  .1   .3    x

 Look but don’t touch

There’s a little bit of magic going on here, but the point to know is that fashion() is returning a noquote version of the original structure:

mtcars %>% correlate() %>% fashion() %>% class()
#> [1] "data.frame" "noquote"

That means that numbers are no longer numbers.

mtcars %>% correlate() %>% sapply(is.numeric)
#> rowname     mpg     cyl    disp      hp    drat      wt    qsec      vs 
#>   FALSE    TRUE    TRUE    TRUE    TRUE    TRUE    TRUE    TRUE    TRUE 
#>      am    gear    carb 
#>    TRUE    TRUE    TRUE

mtcars %>% correlate() %>% fashion() %>% sapply(is.numeric)
#> rowname     mpg     cyl    disp      hp    drat      wt    qsec      vs 
#>   FALSE   FALSE   FALSE   FALSE   FALSE   FALSE   FALSE   FALSE   FALSE 
#>      am    gear    carb 
#>   FALSE   FALSE   FALSE

Similarly, missing values are no longer missing values.

mtcars %>% correlate() %>% sapply(function(i) sum(is.na(i)))
#> rowname     mpg     cyl    disp      hp    drat      wt    qsec      vs 
#>       0       1       1       1       1       1       1       1       1 
#>      am    gear    carb 
#>       1       1       1

mtcars %>% correlate() %>% fashion() %>% sapply(function(i) sum(is.na(i)))
#> rowname     mpg     cyl    disp      hp    drat      wt    qsec      vs 
#>       0       0       0       0       0       0       0       0       0 
#>      am    gear    carb 
#>       0       0       0

So fashion() is for looking at output, not for continuing to work with it.

 What to use it on

fashion() can be used on most standard R structures such as scalars, vectors, matrices, data frames, etc:

fashion(10.277)
#> [1] 10.28
fashion(c(10.3785, NA, 87))
#> [1] 10.38       87.00
fashion(matrix(1:4, nrow = 2))
#>     V1   V2
#> 1 1.00 3.00
#> 2 2.00 4.00

You can also use it on non-numeric data. In this case, all fashion() will do is convert the data to characters, and then alter missing values:

fashion("Hello")
#> [1] Hello
fashion(c("Hello", NA), na_print = "World")
#> [1] Hello World

Now is a good time to take a look back at the opening example to see that it works on a data frame and with a factor column.

 Exporting

Don’t forget that it’s easy to export your fashioned output with something like:

my_data %>% fashion() %>% write.csv("fashioned_file.csv")

So what are you waiting for? Go forth and fashion()!

 Sign off

Thanks for reading and I hope this was useful for you.

For updates of recent blog posts, follow @drsimonj on Twitter, or email me at [email protected] to get in touch.

If you’d like the code that produced this blog, check out the blogR GitHub repository.

To leave a comment for the author, please follow the link and comment on their blog: blogR.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)