In this post, we briefly summarize and discuss the results of our survey on “R and education”. Before diving into the figures, we would like to express our sincere gratitude and appreciation to the 286 R enthusiasts that invested their valuable time to fill out this survey. Furthermore, you can download the complete dataset of the survey or browse an overview of all questions (see bottom of the post for more information), so feel free to do your own analysis, and share it. Note that the right panel of this page provides the answers to some open-ended questions in the survey.
Interestingly, respondents came from diverse backgrounds, both geographically as well as in terms of occupation. The left panel of Figure 1 illustrates respondents are mainly active as academics (50.5%), followed by professionals (30%) and students (19.5%). Academics from about 80 different universities, mainly located in the US and Europe, participated. About 24 respondents were R package authors.
The online survey was distributed through the R mailing lists and our personal contacts. Figure 1 demonstrates the geographical origin of the respondents. Individuals from all 4 continents participated, with the majority based in the US. Although there is selection bias when conducting an online survey in this way, we believe the current diversity of respondents is interesting and adds some flavor to the results.
Next, we first discuss the main takeaways regarding the respondent’s views on R in general. A more focused section follows on R and education. To end, we discuss the next steps we want to undertake based on this survey’s results.
Why you love R and expect its market share to go up
Respondents (from the group “professionals that use R”) are very optimistic when asked about the future spreading of R in the world, as illustrated in Figure 2. An impressive 79.7% of respondents expect the future usage to go up in comparison to other statistical packages such as SAS and SPSS, only 11.9% expects it will remain stable, and just 3.4% of the respondents take a pessimistic view, expecting it will go down.
Figure 3 shows that respondents (from the group “professionals that use R”) mainly love R because of its functionality (86.2%) and the community (65.5%). Other reasons to love R cited under “other” are (among others): “many packages”, “cross platform” and “wonderful for graphics”. All that glitters is not gold though. When asked about their biggest frustration when using R, only 19% answers “Nothing, R is perfect”. The biggest frustrations reported by respondents are “the lack of documentation” (29.3%) and “the lack of consistency” (22.4%). A large number of respondents (34.5%) provided an open-ended response on this question as well. We listed the open-ended responses to this question in the right panel of this page as well as the open-ended responses to what respondents consider as the main disadvantages of R.
Major interest in online learning and teaching R
“R best matches the concept of ‘computational thinking’, a core idea that my students need”
Whether you are completely new to R, or you are a veteran with multiple years of experience, there is always room to learn and improve. As illustrated in Figure 4, one of the main sources to develop new R skills are online resources such as websites and online communities. This is true for both academics (92.4%), and professionals (94.9%). The second most cited educational source is the build-in R help feature, mentioned by 77.2% of the academics, and 83.1% of the professionals. Textbooks, which can be seen as a more traditional way to learn and teach, are placed third.
Today, numerous online courses on statistics are already making use of the R language to explain data analytics concepts. Some of the most noteworthy and successful examples are the Coursera courses from Roger D. Peng (Computing for Data Analysis), and Eric Zivot (Introduction to Computational Finance and Financial Econometrics). This proven need for online educational sources for statistics and R, raises the question if it would be possible to identify different and even more engaging ways to learn R online. The ‘R in Education’ survey indicates over 75% of students are interested to take online courses with an interactive component. Of the Academic respondents, 68.6% shows interest in online interactive courses and 13% would be willing to pay for these courses (see Figure 5). Our survey results are thus in line with the observation that online interactive courses as offered by codecademy.com, codeschool.com, etc. have gained enormous popularity recently.
Naturally, in open-source communities most things are developed and offered for free. As noted in the previous paragraph, interactive online courses would be a valuable addition to the current spectrum of R’s educational sources. Since our results indicate that demand for free courses would be high, the question manifests itself: Who will develop these free courses? A reasonable assumption would be to look at people already developing free software such as the R package authors. Indeed, 70% percent of R package authors in the survey indicated that, given an easy-to-use development platform exists, they would be willing to create such interactive learning tools for their packages for free (note that the sample is small though). Therefore, it might be interesting to develop and eventually provide such a platform as a way to spread data analytics knowledge in general, and the R statistical programming language in specific.
New educational tools to teach R and statistics?
This survey largely confirmed our believe that there is a need for more online educational tools to teach R. These tools should take into account the added value of an interactive approach, as well as the characteristics and benefits of an open-source community. Therefore, we started working on an open interactive exercise platform for statistics and R.
To receive updates on our future progress, or if you are willing to provide us with feedback while building this learning platform, please leave your e-mail address below.
- Download the full dataset of the survey here. The dataset is structured as follows:
qlais a list in which each list-item contains the information of exactly one question in the survey. Each list-item in
qlais itself again a list with the following items:
- First list-item: The question asked
- Second list-item: The answer possibilities
- Third list-item: The data with the answers. Rows for respondents, columns for answers.
NOTE: For privacy reasons we removed all information from the dataset that could result in identification of the respondents (e.g. emails, university affiliation,..). Please contact us in case we overlooked something.
- Have a look at a summary of the results of all questions in the survey.
- The graphics in this post were generated with the R package ggplot2, see code.
- Errata list:
We would like to offer our apologies for the following errors that ended up in the survey:
- When selecting that R is more complex to learn than other statistical languages, one of the following questions stated that you indicated that R was less complex to learn.
- In order to better target the questions and to avoid making the survey even longer, we opted to mostly ask different questions to each type of respondent (Students/Academics/Professionals). Therefore, it is not often possible to make comparisons of the different types of respondents, which is a pity in hindsight.
- …. (?)