Plotting cool graphs in R

[This article was first published on R – Opiniomics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I have to admit to being a bit of a snob when it comes to graphs and charts in scientific papers and presentations.  It’s not like I think I am particularly good at it – I’m OK – it’s just that I know what’s bad.  I’ve seen folk screenshot multiple Excel graphs so they can paste them into a powerpoint table to create multi-panel plots… and it kind of makes me want to scream.   I’m sorry, I really am, but when I see Excel plots in papers I judge the authors, and I don’t mean in a good way.  I can’t help it.  Plotting good graphs is an art, and sticking with the metaphor, Excel is paint-by-numbers and R is a blank canvas, waiting for something beautiful to be created; Excel is limiting, whereas R sets you free.

Readers of this blog will know that I like to take plots that I find which are fabulous and recreate them.  Well let’s do that again ?

I saw this Tweet by Trevor Branch on Twitter and found it intriguing:

It shows two plots of the same data.  The Excel plot:

excel

And the multi plot:

multi

You’re clearly supposed to think the latter is better, and I do; however perhaps disappointingly, the top graph would be easy to plot in Excel but I’m guessing most people would find it impossible to create the bottom one (in Excel or otherwise).

Well, I’m going to show you how to create both, in R. All code now in Github!

The Excel Graph

Now, I’ve shown you how to create Excel-like graphs in R before, and we’ll use some of the same tricks again.

First we set up the data:


# set up the data
df <- data.frame(Circulatory=c(32,26,19,16,14,13,11,11),
		 Mental=c(11,11,18,24,23,24,26,23),
		 Musculoskeletal=c(17,18,13,16,12,18,20,26),
		 Cancer=c(10,15,15,14,16,16,14,14))

rownames(df) <- seq(1975,2010,by=5)

df

Now let's plot the graph


# set up colours and points
cols <- c("darkolivegreen3","darkcyan","mediumpurple2","coral3")
pch <- c(17,18,8,15)

# we have one point on X axis for each row of df (nrow(df))
# we then add 2.5 to make room for the legend
xmax <- nrow(df) + 2.5

# make the borders smaller
par(mar=c(3,3,0,0))

# plot an empty graph
plot(1:nrow(df), 1:nrow(df), pch="", 
		xlab=NA, ylab=NA, xaxt="n", yaxt="n", 
		ylim=c(0,35), bty="n", xlim=c(1,xmax))

# add horizontal lines
for (i in seq(0,35,by=5)) {
	lines(1:nrow(df), rep(i,nrow(df)), col="grey")
}

# add points and lines 
# for each dataset
for (i in 1:ncol(df)) {

	points(1:nrow(df), df[,i], pch=pch[i], 
		col=cols[i], cex=1.5)

	lines(1:nrow(df), df[,i], col=cols[i], 
		lwd=4)


}

# add bottom axes
axis(side=1, at=1:nrow(df), tick=FALSE, 
		labels=rownames(df))

axis(side=1, at=seq(-0.5,8.5,by=1), 
		tick=TRUE, labels=NA)

# add left axis
axis(side=2, at=seq(0,35,by=5), tick=TRUE, 
		las=TRUE, labels=paste(seq(0,35,by=5),"%",sep=""))

# add legend
legend(8.5,25,legend=colnames(df), pch=pch, 
		col=cols, cex=1.5, bty="n",  lwd=3, lty=1)

And here is the result:

excel_plot

Not bad eh?  Actually, yes, very bad; but also very Excel!

The multi-plot

Plotting multi-panel figures in R is sooooooo easy!  Here we go for the alternate multi-plot.  We use the same data.


# split into 2 rows and 2 cols
split.screen(c(2,2))

# keep track of which screen we are
# plotting to
scr <- 1

# iterate over columns
for (i in 1:ncol(df)) {

	# select screen
	screen(scr)

	# reduce margins
	par(mar=c(3,2,1,1))

	# empty plot
	plot(1:nrow(df), 1:nrow(df), pch="", xlab=NA, 
		ylab=NA, xaxt="n", yaxt="n", ylim=c(0,35), 
		bty="n")

	# plot all data in grey
	for (j in 1:ncol(df)) {
		lines(1:nrow(df), df[,j], 
		col="grey", lwd=3)

	}	

	# plot selected in blue
	lines(1:nrow(df), df[,i], col="blue4", lwd=4)

	# add blobs
	points(c(1,nrow(df)), c(df[1,i], df[nrow(df),i]), 
		pch=16, cex=2, col="blue4")

	# add numbers
	mtext(df[1,i], side=2, at=df[1,i], las=2)
	mtext(df[nrow(df),i], side=4, at=df[nrow(df),i], 
		las=2)	

	# add title
	title(colnames(df)[i])

	# add axes if we are one of
	# the bottom two plots
	if (scr >= 3) {
		axis(side=1, at=1:nrow(df), tick=FALSE, 
			labels=rownames(df))
	}

	# next screen
	scr <- scr + 1
}

# close multi-panel image
close.screen(all=TRUE)

And here is the result:

multi_plot

 


And there we have it.

So which do you prefer?

To leave a comment for the author, please follow the link and comment on their blog: R – Opiniomics.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)