**R – Research group Business Informatics**, and kindly contributed to R-bloggers)

Teaching R to students with little to no experience in programming or data analysis is a challenging task. Our talk at useR!2017 showed how different ingredients of our course Exploratory and Descriptive Data Analysis at UHasselt are used to facilitate the learning of R.

Firstly, the educational **environment** at UHasselt, based on *guided **self-study *and the use of small group working sessions, allows each student to have an individual pace of learning and gives them frequent feedback.

Secondly, it is important that the **content** of the course is accessible and allows students to obtain quick results. For that reason, we decided to keep the amount of base-R to a minimum and work with the different packages from the tidyverse. The consistent syntax and piping mechanism significantly lowers the learning curve. Furthermore, starting with visualizasing tidy and clean data gives *quick gains *to students, as they immediately observe the fruits of their labor and things become less abstract. Afterwards, content can gradually be made more complex, also including different preprocessing tasks.

Thirdly, students need to be **motivated**. This is especially important in programs where students, as in our situation, do not have an intrinsic motivation to learn data analysis, because it is not a major component of the curriculum. To increase the motivation, we not only try to make to course material interesting and attractive for students, but also ask them to submit 6 assignments at regular intervals. In the end, a lot of practice is needed to become proficient in analyzing data.

As a consequence of using assignments, the data gathered can be used to get more detailed information on the progress of students and to identify students who are likely to fail on the exam, if no action is undertaken.

The figure below shows different clusters based on the score-pattern for the assignments on the left. On the right the success rate for the exam is shown for each cluster. It can be seen that some clusters have a higher success rate than the overall rate (52%), while others are much lower. However, differences are relatively small and assigning students to one of the clusters can only be done if information on all the assignments is gathered, which means that any intervention will come too late.

In order to have a better idea about *good *and *bad *students, predictor models are learned at different intervals: i.e. each time information about an assignment is obtained, the predictions are modified. Both the rpart predictor (package rpart) and the svm predictor (package e1071) were applied. The graph below shows how the accuracy of the predictors increase with additional information. It can be seen that the accuracy of the svm-predictor is already higher than the baseline accuracy (majority classifier) after assignment 3.

However, what is even more important in this case is the failing students who we missed out on. Therefore, we should look at the false positive rate (FPR) (positive = pass).

It can be seen that the FPR of the svm is already at 40% after assignment 3 and at 30% at assignment 6. The combi-predictor shown in this figure shows the FPR for a conservative combination of both predictors. I.e. if at least one of them predicts a fail, the conservative *combi *will predict fail. If course, a low FPR comes at the cost of a high FNR, i.e. a lot of students without problems are labeled as *fail. *However, this is certainly a more *ethical *classification.

Note that the goal is not to explicitly label the students with *pass *or *fail. *However, the goal is to identify struggling students in order to be able to help them accordingly.

Overall, a good environment, which facilitates frequent feedback, well constructed course materials and assignments to increase students’ motivation and effort seem to be excellent ingredients to make learning R less challenging for students without much prior knowledge. Furthermore, frequent gathering of data on their progress, and the use of Learning Analytics can help to provide individual guiding to students in order to maximize the success-rate.

**leave a comment**for the author, please follow the link and comment on their blog:

**R – Research group Business Informatics**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...