Easy R: Summary statistics grouping by a categorical variable
[This article was first published on R code – data technik, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Once I found this great R package that really improves on the dplyr summary() function it was a game changer.
This library allows for the best summary statistics for each variable grouped by a categorical variable. It can also be saved as a list with an assignment.
library(purrr) credit %>% split(credit$Date) %>% map(summary)
Simply use datatable$column that is the categorical variable then use the map function to run summary. And that’s it! All set to produce results like these:
$Aug Homeowner Credit.Score Years.of.Credit.History Min. :0.0000 Min. :485.0 Min. : 2.00 1st Qu.:0.0000 1st Qu.:545.5 1st Qu.: 5.50 Median :0.0000 Median :591.0 Median : 9.00 Mean :0.3704 Mean :601.6 Mean :10.33 3rd Qu.:1.0000 3rd Qu.:630.0 3rd Qu.:14.50 Max. :1.0000 Max. :811.0 Max. :22.00 Revolving.Balance Revolving.Utilization Approval Loan.Amount $2,000 : 2 100% : 3 Min. :0.0000 $11,855 : 1 $27,000 : 2 65% : 2 1st Qu.:0.0000 $12,150 : 1 $29,100 : 2 70% : 2 Median :0.0000 $13,054 : 1 $1,000 : 1 78% : 2 Mean :0.1481 $15,451 : 1 $10,500 : 1 79% : 2 3rd Qu.:0.0000 $16,218 : 1 $12,050 : 1 85% : 2 Max. :1.0000 $17,189 : 1 (Other) :18 (Other):14 (Other) :21 Date Default Aug :27 0:14 July: 0 1:13 $July Homeowner Credit.Score Years.of.Credit.History Min. :0.0000 Min. :620.0 Min. : 2.0 1st Qu.:0.5000 1st Qu.:682.5 1st Qu.: 8.0 Median :1.0000 Median :701.0 Median :12.0 Mean :0.7391 Mean :711.8 Mean :12.3 3rd Qu.:1.0000 3rd Qu.:746.5 3rd Qu.:16.5 Max. :1.0000 Max. :802.0 Max. :24.0 Revolving.Balance Revolving.Utilization Approval Loan.Amount $11,200 : 2 11% : 2 Min. :0.0000 $3,614 : 2 $11,700 : 2 15% : 2 1st Qu.:1.0000 $12,303 : 1 $6,100 : 2 20% : 2 Median :1.0000 $12,338 : 1 $10,000 : 1 5% : 2 Mean :0.8261 $12,712 : 1 $10,500 : 1 7% : 2 3rd Qu.:1.0000 $13,020 : 1 $11,320 : 1 70% : 2 Max. :1.0000 $17,697 : 1 (Other) :14 (Other):11 (Other) :16 Date Default Aug : 0 0:10 July:23 1:13
You’ll have to do some formatting, or export to excel ! So fast and easy with this one.
To leave a comment for the author, please follow the link and comment on their blog: R code – data technik.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.