I have been meaning to write a long post explaining each of the major R projects I am currently working on, but I can’t seem to find the time to get to them. Instead, I give you a short summary of a few R projects/experiments/ideas I am kicking around. Any comments, feedback, suggestions are welcome — consider it a window into my current goals for working with R and the challenges I am using it to meet in my professional life.
datasynthR – This is an R package to allow easy user creation of simulated data of relative complexity. Key features include the ability to specify the distribution and the correlation of and among numeric variables in a dataframe, the ability to build categorical variables that are correlated with one another or numeric variables, introduction of random and not-at-random missingness, and utility functions to check for such missingness. The goal is to allow users to quickly build more robust tests for the performance of statistical modeling techniques. See the GitHub repo for more information. I hope to release this to CRAN in early 2014.
maintainR – Between work, home, and various configurations of servers and virtual machines, I find that I have R installed in an awful lot of places on an awful lot of architectures. I can’t always keep straight which packages are installed where, where the site library is on each machine, and what version of R I am running (have I gone through the tedious Windows upgrade process on this machine?). maintainR is my solution to this offering some ability to standardize installed libraries and create a common .Rprofile.site across installations. Needs a lot of work, see the GitHub repo here.
eeptools 0.3 – After some inattention, I am hard at work improving my eeptools package and releasing version 0.3 to CRAN. New features in the release are better compatibility with data.table, contributed age_calc and moves_calc functions from Jason Becker. There are also some package development best practices introduced including unit tests and some git tagging and milestone focuses. See the GitHub repo for details.
EWStools – This is a fledgling R package I am working on that applies what I have learned in developing a Dropout Early Warning System (DEWS) for the state of Wisconsin to create a more flexible and generalized predictive modeling framework for educational outcomes. Mostly wrappers for various caret functions, it codifies some practices I have used to help me evaluate model fit and hopefully makes the concept of predictive modeling more approachable.
R modeling tutorials – I have begun compiling course notes from my courses in structural equation modeling and mixed-effect modeling into discrete tutorials for R. This started out because I was afraid I might forget some of the material before I needed to use it again, but it has also proven helpful to others. The initial posts have been well received and I hope to wrap these tutorials up into some GitHub repositories and also fold them into the RBootcamp materials eventually.
Bayesian modeling in R tutorials – As I work through the newest edition of Bayesian Data Analysis (BDA), I want to implement many of the examples of the book in R. I thought, why not do this in the open so others can benefit from it! These tutorials will follow the format laid out in the modeling tutorials. I hope to work on these throughout 2014.
Applied modeling for social scientists talk – Not much to say about this yet, except it is an extension and a deepening of the most recent presentations I have given over on the presentations page.