By: Sean M. Gonzalez
Basic plots in R using standard packages like lattice work for most situations where you want to see trends in small data sets, such as your simulation variables, which make sense considering lattice began with the Bell Lab’s S language. However, when we need to summarize and communicate our work with those primarily interested in the “forest” perspective, we use tools like ggplot2. In other words, the difference between lattice and ggplot2 is the difference between understanding data versus drawing pictures.
You can learn all about ggplot2 by downloading the R package and reading, but even Even Hadley Wickham, author of ggplot2, thinks going through the R help documentation will “drive you crazy!” To alleviate stress, we’ve compiled references, examples, documentation, blogs, books, groups, and commentary from practitioners who use ggplot2 regularly, enjoy.
GGplot2 is an actively maintained open-source chart-drawing library for R based upon the principles of “Grammar of Graphics”, thus the “gg”. Grammar of Graphics was written for statisticians, computer scientists, geographers, research and applied scientists, and others interested in visualizing data. GGplot2 can be generalized as layers composed of: a data set, mappings and aesthetics (position, shape, size color), statistical transforms, and scaling. To better wrap our minds around how this applies to ggplot2, we can take Hadley’s tour, or attend one of his events. The overall goal is to automate graphical processes and put more resources at our fingertips; below are some great works from practitioners.
The London bike routes image is built with three layers: building polygons, waterways and lakes, and bike routes. The route data itself is a count of the number of bikes, as well as their position, featured as thickness and color intensity in yellow, which is a nice contrast to the black and grey of the city map. I enjoy this dataviz because you can imagine yourself trying to get around on a bicycle in London.
The background of this work is the classification of tumour tissues using their Raman-Spectra. A detailed discussion can be found in C. Beleites et al. Gliomas are the most frequent brain tumours, and astrocytomas are their largest subgroup. These tumours are treated by surgery. However, the exact borders of the tumour are hardly visible. Thus the need for new tools that help the surgeon find the tumour border. A grading scheme is given by the World Health Organization (WHO).
Curious about your influence on twitter? Want to see how your messages resonate within and outside your network? Here is a great website that goes through many examples on using the TwitteR package in R, with the following ggplot2 code that creates the chart on our right-hand-side:
This final example of Sentencing Data for Local Courts easily breaks up the data by demographics committing different classes of crimes. As above, the R code is very simple and follows the layering paradigm:
ggplot(iw, aes(AGE,fill=sex))+geom_bar() +