Get basic summary statistics for all the variables in a data frame

[This article was first published on Econometrics and Free Software, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I have added a new function to my {brotools} package, called describe(), which takes a data frame as an argument, and returns another data frame with descriptive statistics. It is very much inspired by the {skmir} package but also by assist::describe() (click on the packages to be redirected to the respective Github repos) but I wanted to write my own for two reasons: first, as an exercice, and second I really only needed the function skim_to_wide() from {skimr}. So instead of installing a whole package for a single function, I decided to write my own (since I use {brotools} daily).

Below you can see it in action:

library(dplyr)
data(starwars)
brotools::describe(starwars)
## # A tibble: 13 x 12
##    variable   type     mean    sd mode        min   max   q25 median   q75
##    <chr>      <chr>   <dbl> <dbl> <chr>     <dbl> <dbl> <dbl>  <dbl> <dbl>
##  1 birth_year Numeric  87.6 155.  <NA>         8.  896.  35.0    52.  72.0
##  2 height     Numeric 174.   34.8 <NA>        66.  264. 167.    180. 191. 
##  3 mass       Numeric  97.3 169.  <NA>        15. 1358.  55.6    79.  84.5
##  4 eye_color  Charac…  NA    NA   blue        NA    NA   NA      NA   NA  
##  5 gender     Charac…  NA    NA   male        NA    NA   NA      NA   NA  
##  6 hair_color Charac…  NA    NA   blond       NA    NA   NA      NA   NA  
##  7 homeworld  Charac…  NA    NA   Tatooine    NA    NA   NA      NA   NA  
##  8 name       Charac…  NA    NA   Luke Sky…   NA    NA   NA      NA   NA  
##  9 skin_color Charac…  NA    NA   fair        NA    NA   NA      NA   NA  
## 10 species    Charac…  NA    NA   Human       NA    NA   NA      NA   NA  
## 11 films      List     NA    NA   <NA>        NA    NA   NA      NA   NA  
## 12 starships  List     NA    NA   <NA>        NA    NA   NA      NA   NA  
## 13 vehicles   List     NA    NA   <NA>        NA    NA   NA      NA   NA  
## # ... with 2 more variables: n_missing <int>, n_unique <int>

As you can see, the object that is returned by describe() is a tibble.

For now, this function does not handle dates, but it’s in the pipeline.

You can also only describe certain columns:

brotools::describe(starwars, height, mass, name)
## # A tibble: 3 x 12
##   variable type      mean    sd mode          min   max   q25 median   q75
##   <chr>    <chr>    <dbl> <dbl> <chr>       <dbl> <dbl> <dbl>  <dbl> <dbl>
## 1 height   Numeric  174.   34.8 <NA>          66.  264. 167.    180. 191. 
## 2 mass     Numeric   97.3 169.  <NA>          15. 1358.  55.6    79.  84.5
## 3 name     Charact…  NA    NA   Luke Skywa…   NA    NA   NA      NA   NA  
## # ... with 2 more variables: n_missing <int>, n_unique <int>

If you want to try it out, you can install {brotools} from Github:

devtools::install_github("b-rodrigues/brotools")

If you found this blog post useful, you might want to follow me on twitter for blog post updates.

To leave a comment for the author, please follow the link and comment on their blog: Econometrics and Free Software.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)