Q is for qplot

Posted on April 19, 2018 by in R bloggers | 0 Comments

[This article was first published on Deeply Trivial, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Q is for qplot You may have noticed that I frequently use the ggplot2 package and the ggplot function to produce graphics for my posts. ggplot2, which is part of the so-called tidyverse, is called gg to refer to the “grammar of graphics”. That is, it uses standard functions and arguments to produce any number of graphics. You can change the appearance of these graphics by applying different settings. The nice thing about this type of syntax is that once you learn it for one type of graphic – say a histogram – it’s very easy to expand out to other types of graphics – like scatterplots – without having to learn brand new functions. ggplot is a great way to create high-quality, publication-ready graphics.

But sometimes you don’t need high-quality, publication-ready. Sometimes you just need a quick look at the data and you don’t care if you have axis labels or centered titles. You just need to make certain there isn’t anything wonky about your data as you clean and/or analyze. Fortunately, ggplot2 has a great function for that – qplot (or quick plot).

As with ggplot, qplot has a standard function and set of arguments, so once you learn to do it for one type of graphic, you can easily expand to others. And qplot has some smart rules built in to default to two of the most frequently used charts (particularly for quick looks at the data): histograms and scatterplots. Why are these most frequently used, especially in cleaning and early stages of analysis? A histogram lets you see if your variable is approximately normal; this is important because many statistical tests (and most of them you would have learned in an Introductory Statistics course) are built on the assumption that data are normally distributed. A scatterplot lets you see if your variables are related to each other, and whether that relationship is linear or not; once again, many statistical tests are built on assumptions about linear relationships between variables. So it makes sense that, if you’re taking a quick look, you’ll probably be using one of these two graphics.

The default graphics are very easy to produce: if you give only an x variable, you’ll get a histogram, and if you give both x and y, you’ll get a scatterplot. I’ll use the Facebook data once again to demonstrate. I also went ahead and scored the RRS and SBI (described below) here – you can find code for scoring all measures here.

Facebook<-read.delim(file="small_facebook_set.txt", header=TRUE)
Facebook$RRS<-rowSums(Facebook[,3:24])
reverse<-function(max,min,x) {
  y<-(max+min)-x
  return(y)
}
Facebook$Sav2R<-reverse(7,1,Facebook$Sav2)
Facebook$Sav4R<-reverse(7,1,Facebook$Sav4)
Facebook$Sav6R<-reverse(7,1,Facebook$Sav6)
Facebook$Sav8R<-reverse(7,1,Facebook$Sav8)
Facebook$Sav10R<-reverse(7,1,Facebook$Sav10)
Facebook$Sav12R<-reverse(7,1,Facebook$Sav12)
Facebook$Sav14R<-reverse(7,1,Facebook$Sav14)
Facebook$Sav16R<-reverse(7,1,Facebook$Sav16)
Facebook$Sav18R<-reverse(7,1,Facebook$Sav18)
Facebook$Sav20R<-reverse(7,1,Facebook$Sav20)
Facebook$Sav22R<-reverse(7,1,Facebook$Sav22)
Facebook$Sav24R<-reverse(7,1,Facebook$Sav24)
Facebook$SBI<-Facebook$Sav2R+Facebook$Sav4R+Facebook$Sav6R+
  Facebook$Sav8R+Facebook$Sav10R+Facebook$Sav12R+Facebook$Sav14R+
  Facebook$Sav16R+Facebook$Sav18R+Facebook$Sav20R+Facebook$Sav22R+
  Facebook$Sav24R+Facebook$Sav1+Facebook$Sav3+Facebook$Sav5+
  Facebook$Sav7+Facebook$Sav9+Facebook$Sav11+Facebook$Sav13+Facebook$Sav15+
  Facebook$Sav17+Facebook$Sav19+Facebook$Sav21+Facebook$Sav23
library(ggplot2)

I'll use a scale I haven't really used in this series - the Savoring Beliefs Inventory. This measure was created by Fred Bryant, who was my faculty sponsor for this research (since I was still a grad student at the time). Fred also taught me structural equation modeling. The measure assesses a concept Fred calls savoring - fixating on positive events and feelings to retain those feelings of joy and pleasure. I selected this measure to include because, as I mentioned to Fred, I felt savoring was the opposite of rumination. (While he thought I'd made a good point, he told me he thought of savoring as the opposite of coping, which makes sense.)

Using the qplot function, we can quickly generate a histogram with total SBI score.

qplot(SBI, data=Facebook)

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

This variable shows a negative skew: there is a long tail (fewer cases than we'd expect if this followed the normal distribution) at the low end, the highest part of the distribution is to the right of center, and there is much less of a tail at the high end (more cases than we'd expect if this followed the normal distribution). We're also getting a message about bins. Right now, the histogram is slicing up the values between the minimum and the maximum into 30 bars. We can reduce this number to smooth out the distribution.