(This article was first published on

**Getting Genetics Done**, and kindly contributed to R-bloggers)I talked a little bit about tidy data my recent post about dplyr, but you should really go check out Hadley’s paper on the subject.

R expects inputs to data analysis procedures to be in a tidy format, but the model output objects that you get back aren’t always tidy. The reshape2, tidyr, and dplyr are meant to take data frames, munge them around, and return a data frame. David Robinson’s broom package bridges this gap by taking un-tidy output from model objects, which are not data frames, and returning them in a tidy data frame format.

(From the documentation): if you performed a linear model on the built-in

`mtcars`

dataset and view the object directly, this is what you’d see:`lmfit = lm(mpg ~ wt, mtcars)`

lmfit

`Call:`

lm(formula = mpg ~ wt, data = mtcars)

Coefficients:

(Intercept) wt

37.285 -5.344

`summary(lmfit)`

`Call:`

lm(formula = mpg ~ wt, data = mtcars)

Residuals:

Min 1Q Median 3Q Max

-4.543 -2.365 -0.125 1.410 6.873

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 37.285 1.878 19.86 < 2e-16 ***

wt -5.344 0.559 -9.56 1.3e-10 ***

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.05 on 30 degrees of freedom

Multiple R-squared: 0.753, Adjusted R-squared: 0.745

F-statistic: 91.4 on 1 and 30 DF, p-value: 1.29e-10

If you’re just trying to read it this is good enough, but if you’re doing other follow-up analysis or visualization, you end up hacking around with

`str()`

and pulling out coefficients using indices, and everything gets ugly quick.But the

`tidy`

function in the broom package run on the fit object probably gives you what you were looking for in a tidy data frame:`tidy(lmfit)`

` term estimate stderror statistic p.value`

1 (Intercept) 37.285 1.8776 19.858 8.242e-19

2 wt -5.344 0.5591 -9.559 1.294e-10

The

`tidy()`

function also works on other types of model objects, like those produced by `glm()`

and `nls()`

, as well as popular built-in hypothesis testing tools like `t.test()`

, `cor.test()`

, or `wilcox.test()`

.View the README on the GitHub page, or install the package and run the vignette to see more examples and conventions.

To

**leave a comment**for the author, please follow the link and comment on their blog:**Getting Genetics Done**.R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...