AP Statistics and R Programming
Should AP Statistics students learn to program in R? Definitely yes. R is a powerful programming language and an important skill to have entering college. In my class, we use the TI-Nspire CX graphing calculator for most of our course work during the year but there are times when I feel it would be better to use R.
We usually have some time to explore data and I think learning some of the basics of R is worth the time and effort. In my school, the computer lab is just down the hall and we are also wired for WI-Fi throughout the school. I have loaded both R and RStudio on our school computers so we are ready to go from day one.
I think it’s important for students to think deeply about their data and using R with RStudio helps a great deal. Plus there is a ton of good data on the internet that becomes available to analyze using tools like R.
Using data from our textbook
We use The Practice of Statistics 3rd Edition, Starnes, Yates, and Moore during the year and I also have a copy of Stats Modeling the World, 3rd Edition (Bock, Velleman, De Veaux)
In Bock’s book, in the beginning of chapter three, they give three rules for data analysis:
- Make a picture
- Make a picture
- Make a picture
The reason given for this is:
These days, technology makes drawing pictures of data easy, so there is no reason not to follow the three rules. Bock, p21
At the beginning of the semester, it’s a good time to introduce the students to the R and RStudio environment. This is where they can play with R’s base graphics, and for the more advanced student, work with ggplot2. It takes time to introduce R and some of the basics, but all of my students have a laptop that they can use at home or at school. I encourage them to download both R and RStudio on their laptop.
Once they have done that, I show them how to download different packages. We will work mostly with ggplot2 and the dplyr package as dplyr makes it easier to subset data.
To begin, I would show my students some code in vector form and how to make a simple bar graph. I will try to recreate the graphs on pages 22 and 23 from Stats, Modeling The World.
I usually put the code on the front SmartBoard for the students to copy. I encourage them to read and to look at code examples on the internet and to bring in any reference material they find in addition to the material I hand out.
Learning to code in R for a bar plot
First I make sure they are in the correct working directory, or at least know what directory they are saving their work to. I then try to recreate some of the bar plots in the text so the students can get interested right away. Also, my students feel that they are doing “real” statistics now by actively creating plots instead of just looking at them in the book.
I also tell them at this point they have to take more ownership of their code when they are stuck. I can help with some of the error messages, but they need to Google any error code to try and figure out the problem. I found this to be a very rewarding part of my teaching by not knowing all the answers and figuring out the solution with the “team”.
So here would be a first attempt at plotting a bar graph.
getwd() a=c("First","Second","Third","Crew") b=c(14.77,12.95,32.08,40.21) barplot(b)
And this is the plot.
barplot(b, main="People on the Titanic by Ticket Class", xlab="Class", ylab="Percent", names.arg=c("First", "Second", "Third", "Crew"), col="darkred", horiz=FALSE)
Here is the new plot.
Another example from the book gives the count for each class.
w=c(325,285,706,885) barplot(w, main="People on the Titanic by Ticket Class", xlab="Class", ylab="Count", names.arg=c("First", "Second", "Third", "Crew"), col="darkred", horiz=FALSE)
The plot looks like this.
Extension: Working with ggplot2
Some of my students will be able to grasp the base graphics in a short time but may need additional help to continue. I usually give them a PDF document for the base graphics and also one for R commands. Other students will run with this and ask for additional information on how to graph. This is when I introduce ggplot2 for those students that are more self directed and motivated. Last year, all my students wanted to learn more about using ggplor2 when they saw the results of a few other students’ plots using ggpolt2.
I will usually spend the last ten minutes in class giving the basics of ggplot2 and how to make a simple plot. I try to avoid using the qplot2 because by the end of the school year, my students will have enough time to internalize the basics of the grammar of graphics. Some of them also used their knowledge of ggplot2 in their AP Economics class when exploring data.
titanic <- as.data.frame(Titanic) ggplot(aes(x=Survived, weight=Freq),data=titanic) + geom_bar(color="blue")
Some would explore with the color = and fill = arguments to come up with different plots.
Here is one beginning plot.
With a little different code, we get something better.
ggplot(titanic, aes(x = Class, y = Freq, fill = Survived)) + geom_bar(stat = "identity")
Which gives this nice plot.
At this point, it take a while for all this information to sink in and many of my students will tell me they will play with the graphs at home to try to impress the class. We will also have experience using our TI Nspire CX to make bar plots as there are clear instructions in the back of the Starnes book.
Other lessons using R in the beginning of the year include converting vector data to a data.frame and using the dplyr package. I’ll share more about these topics in future posts.