# R package to convert statistical analysis objects to tidy data frames

[This article was first published on

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

**Getting Genetics Done**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I talked a little bit about tidy data my recent post about dplyr, but you should really go check out Hadley’s paper on the subject.

R expects inputs to data analysis procedures to be in a tidy format, but the model output objects that you get back aren’t always tidy. The reshape2, tidyr, and dplyr are meant to take data frames, munge them around, and return a data frame. David Robinson’s broom package bridges this gap by taking un-tidy output from model objects, which are not data frames, and returning them in a tidy data frame format.

(From the documentation): if you performed a linear model on the built-in

`mtcars`

dataset and view the object directly, this is what you’d see:lmfit = lm(mpg ~ wt, mtcars) lmfit Call: lm(formula = mpg ~ wt, data = mtcars) Coefficients: (Intercept) wt 37.285 -5.344 summary(lmfit) Call: lm(formula = mpg ~ wt, data = mtcars) Residuals: Min 1Q Median 3Q Max -4.543 -2.365 -0.125 1.410 6.873 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 37.285 1.878 19.86 < 2e-16 *** wt -5.344 0.559 -9.56 1.3e-10 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 3.05 on 30 degrees of freedom Multiple R-squared: 0.753, Adjusted R-squared: 0.745 F-statistic: 91.4 on 1 and 30 DF, p-value: 1.29e-10

If you’re just trying to read it this is good enough, but if you’re doing other follow-up analysis or visualization, you end up hacking around with

`str()`

and pulling out coefficients using indices, and everything gets ugly quick.But the

`tidy`

function in the broom package run on the fit object probably gives you what you were looking for in a tidy data frame:tidy(lmfit) term estimate stderror statistic p.value 1 (Intercept) 37.285 1.8776 19.858 8.242e-19 2 wt -5.344 0.5591 -9.559 1.294e-10

The

`tidy()`

function also works on other types of model objects, like those produced by `glm()`

and `nls()`

, as well as popular built-in hypothesis testing tools like `t.test()`

, `cor.test()`

, or `wilcox.test()`

.View the README on the GitHub page, or install the package and run the vignette to see more examples and conventions.

To

**leave a comment**for the author, please follow the link and comment on their blog:**Getting Genetics Done**.R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.