Get basic summary statistics for all the variables in a data frame

[This article was first published on Econometrics and Free Software, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I have added a new function to my {brotools} package, called describe(), which takes a data frame as an argument, and returns another data frame with descriptive statistics. It is very much inspired by the {skmir} package but also by assist::describe() (click on the packages to be redirected to the respective Github repos) but I wanted to write my own for two reasons: first, as an exercice, and second I really only needed the function skim_to_wide() from {skimr}. So instead of installing a whole package for a single function, I decided to write my own (since I use {brotools} daily).

Below you can see it in action:

library(dplyr)
data(starwars)
brotools::describe(starwars)
## # A tibble: 13 x 12
##    variable   type     mean    sd mode        min   max   q25 median   q75
##                         
##  1 birth_year Numeric  87.6 155.           8.  896.  35.0    52.  72.0
##  2 height     Numeric 174.   34.8         66.  264. 167.    180. 191. 
##  3 mass       Numeric  97.3 169.          15. 1358.  55.6    79.  84.5
##  4 eye_color  Charac…  NA    NA   blue        NA    NA   NA      NA   NA  
##  5 gender     Charac…  NA    NA   male        NA    NA   NA      NA   NA  
##  6 hair_color Charac…  NA    NA   blond       NA    NA   NA      NA   NA  
##  7 homeworld  Charac…  NA    NA   Tatooine    NA    NA   NA      NA   NA  
##  8 name       Charac…  NA    NA   Luke Sky…   NA    NA   NA      NA   NA  
##  9 skin_color Charac…  NA    NA   fair        NA    NA   NA      NA   NA  
## 10 species    Charac…  NA    NA   Human       NA    NA   NA      NA   NA  
## 11 films      List     NA    NA           NA    NA   NA      NA   NA  
## 12 starships  List     NA    NA           NA    NA   NA      NA   NA  
## 13 vehicles   List     NA    NA           NA    NA   NA      NA   NA  
## # ... with 2 more variables: n_missing , n_unique 

As you can see, the object that is returned by describe() is a tibble.

For now, this function does not handle dates, but it’s in the pipeline.

You can also only describe certain columns:

brotools::describe(starwars, height, mass, name)
## # A tibble: 3 x 12
##   variable type      mean    sd mode          min   max   q25 median   q75
##                         
## 1 height   Numeric  174.   34.8           66.  264. 167.    180. 191. 
## 2 mass     Numeric   97.3 169.            15. 1358.  55.6    79.  84.5
## 3 name     Charact…  NA    NA   Luke Skywa…   NA    NA   NA      NA   NA  
## # ... with 2 more variables: n_missing , n_unique 

If you want to try it out, you can install {brotools} from Github:

devtools::install_github("b-rodrigues/brotools")

If you found this blog post useful, you might want to follow me on twitter for blog post updates.

To leave a comment for the author, please follow the link and comment on their blog: Econometrics and Free Software.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)