In the followings you could hopefully read an enjoyable summary about my summer. Instead of going out regularly, like in the years before, I spent the majority of my time on R package-development. Before that change obviously there were some previous signs which showed me, life is not what it was before. I don’t want to land on a too emotional field, but working with R really did a little change on me. This funny story probably shows the difference: about two weeks ago, in a conversation I could only use an expression which is good for programming (
&&) and I couldn’t say it in a normal way (for me the normal way is the Hungarian language, I kinow most of the people have really different idea about „normality” than the Hungarian language). It’s never late to become a nerd..
So the story of my summer started with a recommendation I got from the mentor of my former internship. It was about an opportunity for students to code and develop open source programs, like R. The project was named the Google Summer of Code 2013 (or you can just say GSoC 2013). I already wrote a short post about the first part of the project where I wrote the first impressions and the expectation towards the next stage. This time, I will write about why these expectations haven’t come. Next to that, there will be a short summary about the whole project and reminiscences about the feelings and experiences I gained.
The exact project which for I won a slot called „Improving rapport and pander packages”. At the end it reduced to improve the rapport package. I didn’t use the word „only”, because in the second stage I became acquainted with the rapport package more deeply than I expected and planned before. On the other hand, I am obviously a little bit disappointed because I didn’t work on the pander package. Rapport is a pretty useful package with the help one is able to create reproducible statistical report templates. These templates can be exported to different formats, like HTML, LaTeX, pdf, odt or docx. You don’t have to owe special skills to write a rapport template, you just need to do that in Pandoc’s markdown syntax and use some rapport conventions.
Improving the package meant to write some predefined templates, refresh the former ones and use my imagination to explore other fields. The result was to create 16 new templates beside maintaining and improving 22 former ones. Documentation was also a part of the project, so I updated and tweaked the documentation of both packages.
Working in this project was a big step for me not just for the fact that I learned the structure of R better; I also got familiar with other nifty tools, like RStudio, GitHub and Travis CI. They kind of belong together within the whole working process, learning them to use makes a workflow more fluent, but I can recommend to use them separately as well, if someone’s interested. I have to admit, that I am really far to say I am aware all the benefits of these tools and that I am able to use them without doubts but compared to the beginning, my knowledge improved a lot. In short about them:
- RStudio is an awesome interface for using R that improved a lot on my working efficiency (there was and remained potential as well),
- GitHub is a version control system which saved a lot of time for me (after getting the hang of it), because all the changes I made were online accessible and easy to follow.
- Travis.ci is an automated online system, where all the committed changes in the templates are tested immediately.
Now let’s see the “benefits” of my work. The original ideas of my mentors were to produce as many templates for statistical tests as possible. They listed some of them (the links points to the template sources on GitHub):
- Hierarchical Cluster analysis
- Factor analysis
- Multidimensional scaling
- Graphing templates with numerous input fields, which were produced based on lattice:
While working on the program, there were also other templates that were added to the original ideas, like:
And as a summary, I’ve also built-up two “wizard” templates that takes any kind (class) of variables as inputs and automatically runs the appropriate statistical tests from the previously implemented templates. Will show a quick demo below.
Almost a year ago, I wrote a post about using the rapport package, so I will not go into the details now, but it could be informative to see how a template works. The first step on order to do that is to install rapport package. The changes are not on CRAN yet, so it’s advisable to download the package directly from GitHub e.g. with the awesome
library('devtools') install_github('rapport', 'Rapporter')
After these required steps and then running the following command, let us visualize some variables from the bundled Internet Usage Survey (2008):
rapport.html('graphs/GraphingWizard', data=ius2008, variables='age')
Would result in a densityplot (as
age being a quantitative variable), while the following command would end up with a barchart:
rapport.html('graphs/GraphingWizard', data=ius2008, variables='gender')
Now, please guess the results of the following R expression (hint:
edu is a numeric variable):
rapport.html('graphs/GraphingWizard', data=ius2008, variables=c('age', 'edu'))
At the final words I can say that working on the rapport package and participate in the GSoC 2013 was a great opportunity for me to recognize some new work fields and I can be sure that the meeting with new tools and approaches developed my skills. Next to that hopefully the package also gained some effective improvements.