Adding Labels to Points in a Scatter Plot in R

(This article was first published on The Chemical Statistician » R programming, and kindly contributed to R-bloggers)

What’s the Scatter?

A scatter plot displays the values of 2 variables for a set of data, and it is a very useful way to visualize data during exploratory data analysis, especially (though not exclusively) when you are interested in the relationship between a predictor variable and a target variable.  Sometimes, such data come with categorical labels that have important meanings, and the visualization of the relationship can be enhanced when these labels are attached to the data.

It is common practice to use a legend to label data that belong to a group, as I illustrated in a previous post on bar charts and pie charts.  However, what if every datum has a unique label, and there are many data in the scatter plot?  A legend would add unnecessary clutter in such situations.  Instead, it would be useful to write the label of each datum near its point in the scatter plot. I will show how to do this in R, illustrating the code with a built-in data set called LifeCycleSavings.

The LifeCycleSavings Data Set

A data set containing such labels is LifeCycleSavings, a built-in data set in R.  Each row contains economic or demographic data for a particular country.  In this case, the country is a unique categorical label for each datum.  I will plot aggregate personal savings (sr) as a function of real per-capita disposable income (dpi), and I will label each datum with its associated country.  Note that I am not saying anything about a predictive relationship in this context; I am simply trying to explore the data in these 2 dimensions, and I may eventually find clustering to be useful for further analysis, as I alluded to earlier in the introduction.

Here are the first 9 data, just to give you a sense of what this data set looks like.

> LifeCycleSavings[1:9,]
                   sr            pop15           pop75        dpi              ddpi
Australia          11.43         29.35           2.87         2329.68          2.87
Austria            12.07         23.32           4.41         1507.99          3.93
Belgium            13.17         23.80           4.43         2108.47          3.82
Bolivia            5.75          41.89           1.67         189.13           0.22
Brazil             12.88         42.19           0.83         728.47           4.56
Canada             8.79          31.72           2.85         2982.88          2.43
Chile              0.60          39.74           1.34         662.86           2.67
China              11.90         44.75           0.67         289.52           6.51
Colombia           4.98          46.64           1.06         276.65           3.08

(It actually isn’t nicely aligned in the output; I manually aligned it for you to make it easier to see each column. :) )

The plot() and text() Functions

First, let’s use the plot() function to plot the points.

##### Labelling Points in a Scatter Plot
##### By Eric Cai - The Chemical Statistician

plot(sr~dpi, xlim = c(0, 3500), xlab = 'Real Per-Capita Disposable Income', ylab = 'Aggregate Personal Savings', main = 'Intercountry Life-Cycle Savings Data', data = LifeCycleSavings[1:9,])

Then, let’s use the text() function to add the text labels to the data.  It has to be nested within the with() function, because, unlike plot(), “data” is not a valid option for text().

with(LifeCycleSavings[1:9,], text(sr~dpi, labels = row.names(LifeCycleSavings[1:9,]), pos = 4))

The value for the “labels” option looks complicated, but it’s just a vector of strings that I abstracted from the first 9 rows of the names of the “LifeCycleSavings data frame using row.names(), which is a very useful function!

The “pos” option specifies the position of the text relative to the point.  I have chosen to use “4″ because I want the text to be to the right of the point.

1 = below

2 = left

3 = above

4 = right

Exporting the Image as a PNG File

Finally, let’s sandwich the two lines of plotting functions with png() and dev.off() to export the image as a PNG file into my chosen directory.  Here is the entire script.

png('Insert Your Directory Path Here/savings.png')
plot(sr~dpi, xlim = c(0, 3500), xlab = 'Real Per-Capita Disposable Income', ylab = 'Aggregate Personal Savings', main = 'Intercountry Life-Cycle Savings Data', data = LifeCycleSavings[1:9,])

with(LifeCycleSavings[1:9,], text(sr~dpi, labels = row.names(LifeCycleSavings[1:9,]), pos = 4))
dev.off()

Here is the plot.

savings

Why Not attach()?

I could have used the attach() function to set this data set in the search path in R, so that any variable in this data set can be called by simply entering its name.  (Of course, it’s good to stop this after using this data set with the detach() function.)  This would have made the plotting codes simpler.  However, as Nick Horton on R Bloggers points out, this is not a recommended practice.

The alternative script is this:

 
attach(LifeCycleSavings[1:9,])
png('Insert Your Directory Path Here/savings.png')
plot(dpi, sr, xlim = c(0, 3500), xlab = 'Real Per-Capita Disposable Income', ylab = 'Aggregate Personal Savings', main = 'Intercountry Life-Cycle Savings Data')
text(dpi, sr, labels = row.names(LifeCycleSavings[1:9,]), pos = 4)
dev.off()
detach(LifeCycleSavings[1:9,])

Filed under: Plots, R programming Tagged: attach(), data, data visualization, detach(), labels, LifeCycleSavings, plot, plots, plotting, PNG, R, R programming, row.names(), scatter plot, statistics, text

To leave a comment for the author, please follow the link and comment on his blog: The Chemical Statistician » R programming.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.