ggplot2 package in R provides a reliable system for describing and building graphs. The package is capable of creating elegant and aesthetically pleasing graphics. The framework of ggplot2 is quite different (in comparison to graphics package) and is based on the grammar of graphics(introduced initially by Leland Wilkinson). At first, you may not find it intuitive, but don’t worry, we are here to help. Together, we will master it to the core.
Basic plotting framework for ggplot
ggplot(data = dataset name) + <GEOM_FUNCTION>(mapping = aes(variable name))
Mapping the aesthetics(using aes)
The aesthetic represents the object which you wish to plot in your graph. In other words, aesthetics represent different ways in which you can plot your data points. To showcase the data points, you can change things like size, shape, or color of the points. Thus by using aesthetics (represented by
aes()) you can convey the information which is hidden in your dataset.
For example, you can map color to cylinder variable to reveal the relationship between mileage and weight. So let us take our framework and add aesthetics to it. Here we have three variables, and that means we have to pass three arguments to the
# Loading the library library(ggplot2) # loading data and converting cyl variable to factor data(mtcars) mtcars$cyl <- as.factor(mtcars$cyl) # Adding aesthetics ggplot(data = mtcars) + geom_point(mapping = aes(x = mpg, y = wt))
Mapping Geometric shapes(using geom)
The geometric shapes in ggplot are visual objects which you can use to describe your data. For example, one can plot histogram or boxplot to describe the distribution of a variable.
Below mentioned two plots provide the same information but through different visual objects. These objects are defined in ggplot using geom. That means you can use geom to define your plot. For example, the histogram uses histogram geom, barplot uses bar geom, line plot uses line geom, and so on. There is one exception. We use point geom to plot the scatter plots.
Let’s see how we can draw the charts, which we mentioned in the above example using geoms for the total sleep hours of animals.
Every geom function requires you to map an aesthetic to it. However, not every aesthetic requires a geom. For example, one can set the shape of a point, but you cannot set the shape of a line.
# Building a histogram ggplot(data = msleep) + geom_histogram(mapping = aes(x = sleep_total, col = "orange"))
# Building a histogram ggplot(data = msleep) + geom_boxplot(mapping = aes(y = sleep_total))
Using Facets in ggplot2
Facet is a way in which you can add additional categorical variables to your plot. The facet helps in building the chart by dividing the data into two or more groups. The data from these groups are used for plotting the data.
Now there are two ways in which you can use facets:
A. If you want to split the data by only one variable, then use
facet_wrap() function. In the following syntax, you will notice tilder(~). By default, this is the first argument. After this, you should mention the variable name by which you want to do the split.
Let’s check the distribution of total sleep by kind of animal.
# Working example of facet_wrap ggplot(data = msleep) + geom_histogram(mapping = aes(x = sleep_total)) + facet_wrap(~ vore)
B. If you want to split the data by a combination of two variables, then you can use
facet_grid(). Here the two variables should be separated by the tilder(~).
Building the scatter plot between mpg and disp variable by cyl and am type.
# loading data data(mtcars) # Converting cylinder(cyl) and automatic(am) variable to factor variables. mtcars$cyl <- as.factor(mtcars$cyl) mtcars$am <- as.factor(mtcars$am) # Working example of facet_grid ggplot(data = mtcars) + geom_point(mapping = aes(x = mpg, y = disp)) + facet_grid(cyl ~ am)
Mapping colors to variables in ggplot2
Colors can play a game-changer role in any data visualization, and thus it becomes important for us to learn about it. The default color in ggplot is on the greyscale. But if you want, you can change the color.
In ggplot, there are a couple of ways in which you can use color.
A. You can assign the colors to the objects, lines, and points. To color the objects, you can use
fill() argument. To set colors to the lines and points, you can use the
color argument. Below is a quick example of both cases.
# Making the points blue color in the scatter plot ggplot(data = mtcars) + geom_point(mapping = aes(x = mpg, y = wt), color = "blue")
# Making the bars of histogram blue ggplot(data = iris) + geom_histogram(mapping = aes(x = Sepal.Width), fill = "blue")
B. We can use color to map the values of the third variable, which we have already learned in the very first example under mapping aesthetics.
By default the ggplot2 uses scale_fill_hue() and scale_colour_hue() for color selection. However you can choose to change the luminance of these colors. Also there are other color scales available in R from RColorBrew package.
Example 1 – Showcasing Default RColorBrew setup
ggplot(data = mtcars) + geom_boxplot(mapping = aes(x = cyl, y = mpg, fill=cyl)) + scale_fill_brewer()
Example 2 – Showcasing Set1 pallette colors
ggplot(data = mtcars) + geom_boxplot(mapping = aes(x = cyl, y = mpg, fill=cyl)) + scale_fill_brewer(palette="Set1")
Example 3 – Showcasing Spectral palette colors
ggplot(data = mtcars) + geom_boxplot(mapping = aes(x = cyl, y = mpg, fill=cyl)) + scale_fill_brewer(palette="Spectral")
For Your reference sharing the RBrewColor Pallet chart.
Understanding the Coordinate System of ggplot2
The coordinates system of ggplot is a little complicated. But don’t worry, we will not dig too much. As of now, we will provide you with some examples of coordinate systems. If you pay attention to these, I think most of the job is done, and you are on your way to creating awesome charts using ggplot2. With time, I am sure you will be able to take deeper plunges into ggplot coordinate system. To start with, I have shortlisted some five functions as given below:
coord_cartesian() – This is the default coordinate system in ggplot2. According to this system the X and Y positions of each point act independently to determine its location on the graph.
coord_flip() – This is helpful in cases when you want to build horizontal graphs. This function switches the X and Y-axis. For example, you can use
coord_flip to draw horizontal boxplots.
ggplot(data = mtcars) + geom_boxplot(mapping = aes(x = cyl, y = mpg, fill=cyl)) + scale_fill_brewer(palette="Set1") + coord_flip()
coord_polar() – This creates a nice combination charts of bar and coxcomb or pie graphs by using polar coordinates.
# Loading girbExtra Library library(gridExtra) # Generating a barplot bar <- ggplot(data = mtcars) + geom_bar( mapping = aes(x = cyl, fill = cyl), width = 1 ) # Saving the two plots plot1 <- bar + coord_flip() plot2 <- bar + coord_polar() # Plotting the graphs by cloumn grid.arrange(plot1, plot2, ncol = 2)
In the above code, we have used a gridExtra package. I love this package it makes plotting multiple charts on the same canvas very easy.
coord_map() – This functions creates a 2D map of the desired earth location. We use
coor_polygon along with coord_map to a map with maintained aspect ratio. If you do not understand what this means then just run the code once without the coord_map part.
# An example showcasing the map of USA italy <- map_data("italy") ggplot(italy, aes(long, lat, group = group)) + geom_polygon(fill = "lightblue", colour = "black") + coord_map()
coord_fixed() – This coordinate system ensures that the aspect ratio of axes is kept inside the specified range. Check out the below examples:
# Building a scatter plot plot <- ggplot(mtcars, aes(mpg, wt)) + geom_point() # Setting the ratio to 1 ratio1 <- plot + coord_fixed(ratio = 1) # Setting ratio to 10 ratio10 <- plot + coord_fixed(ratio = 3) # plotting then in grid grid.arrange(ratio1, ratio10, ncol = 2)
Support for statistical transformation in ggplot
Among many useful features of ggplot2, the one which may become dear to you is the support for statistical transformations. These functions save a lot of time as you don’t have to prepare the data for it, and the statistical calculations can be done on the go. Again there are multiple statistical functions, and we encourage you to explore them. However, below I have listed some of the most widely used statistical functions.
stat_count – Creates a bar plot showcasing the frequency count of each level of categorical variable.
# Plotting the bar chart of cylinder counts ggplot(data = mtcars) + stat_count(mapping = aes(x = cyl))
stat_density() – Creates a kernel density plot. Kernel density estimate is a smoothed version of histogram. A very useful alternative for histogram to plot the histogram.
# Plotting the bar chart of cylinder counts ggplot(data = iris) + stat_density(mapping = aes(x = Petal.Length))
stat_summary() – The function summarises the Y Variable for each unique values of X Variable.
# Plotting the bar chart of cylinder counts ggplot(data = iris) + stat_summary(mapping = aes(x = Species, y = Petal.Length), fun.ymin = min, fun.ymax = max, fun.y = mean)
stat_smooth() – Adds a smooth line to a scatter plot.
# Adding smooth line to the scatter plot ggplot(mpg, aes(displ, hwy)) + geom_point() + geom_smooth()
Themes Themes Themes
You must have noticed that the default theme for ggplot2 is pretty much greyish in color. If you are not a great fan of grey color, then don’t worry. Ggplot2 has a couple of themes for you to choose from.
library(gridExtra) p1 <- ggplot(mpg, aes(displ, hwy)) + geom_point() + geom_smooth() + ggtitle("theme_bw") + theme_bw() p2 <- ggplot(mpg, aes(displ, hwy)) + geom_point() + geom_smooth() + ggtitle("theme_linedraw") + theme_linedraw() p3 <- ggplot(mpg, aes(displ, hwy)) + geom_point() + geom_smooth() + ggtitle("theme_gray") + theme_gray() p4 <- ggplot(mpg, aes(displ, hwy)) + geom_point() + geom_smooth() + ggtitle("theme_dark") + theme_dark() p5 <- ggplot(mpg, aes(displ, hwy)) + geom_point() + geom_smooth() + ggtitle("theme_minimal") + theme_minimal() p6 <- ggplot(mpg, aes(displ, hwy)) + geom_point() + geom_smooth() + ggtitle("theme_void") + theme_void() grid.arrange(p1,p2,p3,p4,p5,p6, ncol = 3, nrow = 2)