Using Line Segments to Compare Values in R

January 31, 2013
By

(This article was first published on Mollie's Research Blog, and kindly contributed to R-bloggers)

Sometimes you want to create a graph that will allow the viewer to see in one glance:

• The original value of a variable
• The new value of the variable
• The change between old and new

One method I like to use to do this is using geom_segment and geom_point in the ggplot2 package.

First, let’s load ggplot2 and our data:

`library(ggplot2)data <- as.data.frame(USPersonalExpenditure) # data from package datasetsdata\$Category <- as.character(rownames(USPersonalExpenditure)) # this makes things simpler later`

Next, we’ll set up our plot and axes:

`ggplot(data, aes(y = Category)) +labs(x = "Expenditure", y = "Category") +`

For geom_segment, we need to provide four variables. (Sometimes two of the four will be the same, like in this case.) x and y provide the start points, and xend and yend provide the endpoints.

In this case, we want to show the change between 1940 and 1960 for each category. Therefore our variables are the following:

• x: “1940”
• y: Category
• xend: “1960”
• yend: Category
`geom_segment(aes(x = data\$"1940",  y = Category,  xend = data\$"1960",  yend = Category), size = 1) +`

Next, we want to plot points for the 1940 and 1960 values. We could do the same for the 1945, 1950, and 1955 values, if we wanted to.

`geom_point(aes(x = data\$"1940", color = "1940"), size = 4, shape = 15) +geom_point(aes(x = data\$"1960", color = "1960"), size = 4, shape = 15) +`

Finally, we’ll finish up by touching up the legend for the plot:

`scale_color_discrete(name = "Year") +theme(legend.position = "bottom")`

 geom_segment, then geom_point

The order of geom_segment and the geom_points matters. The first geom line in the code will get plotted first. Therefore, if you want the points displayed over the segments, put the segments first in the code. Likewise, if you want the segments displayed over the points, put the points first in the code.

For example, we could change the middle section of the code to:

`geom_point(aes(x = data\$"1940", color = "1940"), size = 4, shape = 15) +geom_point(aes(x = data\$"1960", color = "1960"), size = 4, shape = 15) +`

`geom_segment(aes(x = data\$"1940",  y = Category,  xend = data\$"1960",  yend = Category), size = 1) +`

And the output would look like:

 geom_point then geom_segment

Similarly, if you have points that will be overlapping, make sure you think about which of the point lines you want R to plot first.
The code is available in a gist.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...