Blog Archives

Split a Data Frame into Testing and Training Sets in R

February 24, 2011
By

I recently analyzed some data trying to find a model that would explain body fat distribution as predicted by several blood biomarkers. I had more predictors than samples (p>n), and I didn't have a clue which variables, interactions, or quadratic terms made biological sense to put into a model. I then turned to a few data mining procedures that I...

Read more »

Get all your Questions Answered

February 22, 2011
By

When I have a question I usually ask the internet before bugging my neighbor. Yet it seems like Google's search results have become increasingly irrelevant over the last few years, and this is especially true for searching anything related to R (and pr...

Read more »

Get all your Questions Answered

February 22, 2011
By

When I have a question I usually ask the internet before bugging my neighbor. Yet it seems like Google's search results have become increasingly irrelevant over the last few years, and this is especially true for searching anything related to R (and pr...

Read more »

R: Given column name in a Data Frame, Get the Index

February 17, 2011
By

Had a mental block today trying to figure out how to get the indices of columns in a data frame given their names. Simple task but difficult to search Google for an answer. Thanks to jashapiro, Matt, and Vince for giving me a heads up on the which() fu...

Read more »

R: Given column name in a Data Frame, Get the Index

February 17, 2011
By

Had a mental block today trying to figure out how to get the indices of columns in a data frame given their names. Simple task but difficult to search Google for an answer. Thanks to jashapiro, Matt, and Vince for giving me a heads up on the which() fu...

Read more »

Summarize Missing Data for all Variables in a Data Frame in R

February 16, 2011
By

Something like this probably already exists in an R package somewhere out there, but I needed a function to summarize how much missing data I have in each variable of a data frame in R. Pass a data frame to this function and for each variable it'll give you the number of missing values, the total N, and the...

Read more »

Summarize Missing Data for all Variables in a Data Frame in R

February 16, 2011
By

Something like this probably already exists in an R package somewhere out there, but I needed a function to summarize how much missing data I have in each variable of a data frame in R. Pass a data frame to this function and for each variable it'll give you the number of missing values, the total N, and the...

Read more »

R function for extracting F-test P-value from linear model object

January 10, 2011
By

I thought it would be trivial to extract the p-value on the F-test of a linear regression model (testing the null hypothesis R²=0). If I fit the linear model: fit<-lm(y~x1+x2), I can't seem to find it in names(fit) or summary(fit). But summary(fit)$fstatistic does give you the F statistic, and both degrees of freedom, so I wrote this function to...

Read more »

Webinar on Revolution R Enterprise

December 7, 2010
By

R evangelist David Smith, marketing VP at Revolution R, will be giving a webinar showing off some of the finer features of Revolution R Enterprise - an integrated development environment (IDE) for R that has an enhanced script editor with syntax highli...

Read more »

Using the "Divide by 4 Rule" to Interpret Logistic Regression Coefficients

December 6, 2010
By

I was recently reading a bit about logistic regression in a book on hierarchical/multilevel modeling when I first learned about the "divide by 4 rule" for quickly interpreting coefficients in a logistic regression model in terms of the predicted probabilities of the outcome. The idea is pretty simple. The logistic curve (predicted probabilities) is steepest at the center where...

Read more »