(This article was first published on

**Bart Rogiers - Sreigor**, and kindly contributed to R-bloggers)Working on a paper, I ran into the problem of needing data from a graph that was not mine, and for which no underlying table was published. With today’s software packages, it is however not very difficult to digitize a figure yourself. I remembered reading something about it on R-bloggers or in the R journal, and it turns out both had useful information. The R package to go for is ‘digitize‘, of which you find the publication here, and a blog post on how to use it here. You can install it the usual way:

install.packages('digitize')

library(digitize)

I now like to use this example from Gelhar et al. (1992), since I was actually looking at dispersivity data. The figure can be found at http://goo.gl/niJhi (other versions: http://goo.gl/rlXSP, http://goo.gl/WPvYQ – it is quite a famous paper you know – and a pdf of the paper seems to be available here).

The figure gives longitudinal dispersivity in function of scale, as obtained by a large number of authors. Now suppose we do our own experiments to determine dispersivity at a certain scale in a certain sediment. It would be very useful to compare the results to this compilation of literature values. This paper shows the data in a table though, but this is not always true. Especially for older papers, it might be difficult to retrieve the actual data, and this is where the digitize package comes in. When the graph shows several point sets (and you want to digitize them separately), and has one or two log-scale axes, the simple wrapper function at the bottom of this page will make the task at hand a lot easier! The function arguments are the following:

- name: Name of or path to the figure (has to be *.jpg; convert with GIMP if necessary)
- x1,x2,y1,y2: Minimum and maximum values of the x and y axes
- sets: Number of point sets you want to digitize separately (default 1)
- setlabels: Labels of the different point sets (numbers by default)
- log: Argument similar to the standard R plot argument for logarithmic axes (can take ‘x’,’y’ or ‘xy’)
- xlab, ylab: Optional specification of the axes in the plot that is generated by the function

The command I used:

First you have to mark the 4 points on the axes, and then you can click on all points of the first point set, click finish, continue with the next, etc. The function returns a dataset with x and y coordinates and the labels corresponding to the different point sets. Easy to program, but very convenient!

digitize.graph <- function(name,x1,x2,y1,y2,sets=1,setlabels=1:sets,log='',xlab='x axis',ylab='y axis')

{

dataset <- data.frame(x=NULL,y=NULL,lab=NULL)

cat('Mark axes min and max values \n')

axes.points <- ReadAndCal(name)

if(log=='x'){x1 <- log10(x1);x2 <- log10(x2)}

if(log=='y'){y1 <- log10(y1);y2 <- log10(y2)}

if(log=='xy'){x1 <- log10(x1);x2 <- log10(x2);y1 <- log10(y1);y2 <- log10(y2)}

for(i in 1:sets)

{

cat(paste('Mark point set "',setlabels[i],'"\n',sep=''))

data.points <- DigitData(col = 'red')

dat <- Calibrate(data.points, axes.points, x1, x2, y1, y2)

dat$lab <- rep(setlabels[i],nrow(dat))

dataset <- rbind(dat, dataset)

}

if(log=='x'){dataset$x <- 10^(dataset$x)}

if(log=='y'){dataset$y <- 10^(dataset$y)}

if(log=='xy'){dataset$x <- 10^(dataset$x);dataset$y <- 10^(dataset$y)}

print(dataset)

plot(dataset$x,dataset$y,log=log,pch=as.numeric(as.factor(dataset$lab)),col=as.numeric(as.factor(dataset$lab)),xlab=xlab,ylab=ylab)

legend('bottomright',setlabels, pch=1:sets,col=1:sets, bty='n')

return(dataset)

}

To

**leave a comment**for the author, please follow the link and comment on their blog:**Bart Rogiers - Sreigor**.R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...