Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Last week I had my class practice making a box plot using the data on page 66 in The Practice of Statistics 4th Edition (TPS 4ed) text book.
I’m still going over the details of making a box plot with just a single vector or variable of data. Many of the problems in our textbook so far give this kind of data. To use ggplot, you need to make sure your data is in a data frame. So for this exercise, I’ll make some small adjustments and put the data into a data frame. More data frame info here.
My class is already familiar with matrices and matrix multiplication from their math class but now they needed to learn about a different type of data format, a data frame. A data frame is a list of vectors of equal length but can have different types of data.
Our goal in the computer lab was to create a box plot from the data in the text book using ggplot. They quickly found out that ggplot will not produce a plot with a single vector of data since ggplot requires both an x and y variable for a box plot.
The class had to search for the solution of changing a single vector into a data frame so we could use ggplot. It only took a few minutes to find a solution at stackoverflow.
From stackoverflow, this helped get them going. Before using ggplot, I had them use R’s base graphics just so we could see the difference. Also, R’s base graphics will plot the single vector data.
Here is the data from page 66 and the box plot in base graphics. You can see it’s pretty basic.
male = c(127,44,28,83,0,6,78,6,5,213,73,20,214,28,11) boxplot(male)
Now we plot the same data in ggplot. To use ggplot, the data must first be in a data frame. I load ggplot and dplyr using the library function. I may use dplyr later so I’ll load it now.
Code for male data
library("dplyr", lib.loc="/Library/Frameworks/R.framework/Versions/3.3/Resources/library")
library("ggplot2", lib.loc="/Library/Frameworks/R.framework/Versions/3.3/Resources/library")
male = data.frame(c(127,44,28,83,0,6,78,6,5,213,73,20,214,28,11)) # data from page 66
ggplot(data = male, aes(x = "", y = male)) + 
  geom_boxplot() +
  coord_cartesian(ylim = c(0, 150)) # I set the y axis scale so the plot looks better.
Here we can take a quick look at the summary statistics.
summary(male)
## c.127..44..28..83..0..6..78..6..5..213..73..20..214..28..11. ## Min. : 0.0 ## 1st Qu.: 8.5 ## Median : 28.0 ## Mean : 62.4 ## 3rd Qu.: 80.5 ## Max. :214.0
I now put the female data into a data frame and bring both male and female together into another data frame so I can plot both using ggplot. I found a neat method on Stackoverflow showing how to do this here.
# Here is the code to plot male and female data using ggplot a = data.frame(group = "male", value = c(127,44,28,83,0,6,78,6,5,213,73,20,214,28,11)) b = data.frame(group = "female", value = c(112,203,102,54,379,305,179,24,127,65,41,27,298,6,130,0)) plot.data = rbind(a, b) # this function will bind or join the rows. See data at bottom. ggplot(plot.data, aes(x=group, y=value, fill=group)) + # This is the plot function geom_boxplot() # This is the geom for box plot in ggplot.
The final result
Above, you can see both the male and female box plots together with different colors. Ggplot does most of the work as there are only a few lines of code. My students enjoy plotting the data from the text book and learning how to manipulate the code to produce cool plots.
They are also learning to problem solve the code as I can only help with the basics. We are finding that stackoverflow is a great resource.
The data in a data frame format
I have my students show their data especially now that it’s in a data frame with two factors. Here is what the data looks like in the data frame. Notice how both male and female are in the column “group” and the values are in the column “value”.
plot.data
## group value ## 1 male 127 ## 2 male 44 ## 3 male 28 ## 4 male 83 ## 5 male 0 ## 6 male 6 ## 7 male 78 ## 8 male 6 ## 9 male 5 ## 10 male 213 ## 11 male 73 ## 12 male 20 ## 13 male 214 ## 14 male 28 ## 15 male 11 ## 16 female 112 ## 17 female 203 ## 18 female 102 ## 19 female 54 ## 20 female 379 ## 21 female 305 ## 22 female 179 ## 23 female 24 ## 24 female 127 ## 25 female 65 ## 26 female 41 ## 27 female 27 ## 28 female 298 ## 29 female 6 ## 30 female 130 ## 31 female 0
Our next unit is on probability. I haven’t decided on an R lesson yet using probability. Maybe we’ll just continue practicing with more plots with ggplot.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
