Digitizing plots

March 12, 2014
By

(This article was first published on Dan Kelley Blog/R, and kindly contributed to R-bloggers)

Introduction

It is not uncommon to want to digitize values from a graph in a paper, whether to do some initial analysis without bothering an author, or to get data from a paper published so long ago that the data are available only graphically. Although there are several software applications that do this well, it is also possible to use the locator() function of R. This posting shows how to do that.

Method

Code for digitizing a plot on the screen is given below, without comment. It can be saved to a file, for later use. (I don’t bother commenting because the work of doing that is about equal to the work of making a package, which I may do, if anyone expresses interest.)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
xaxis <- function(values) {
    n <- length(values)
    message("click on the x axis at places where x=", paste(values, collapse = ","), 
        "\n")
    xy <- locator(n)
    m <- lm(values ~ xy$x)
    C <- as.numeric(coef(m))
    xa <<- C[1]
    xb <<- C[2]
}
yaxis <- function(values) {
    n <- length(values)
    message("click on the y axis at places where x=", paste(values, collapse = ","), 
        "\n")
    xy <- locator(n)
    m <- lm(values ~ xy$y)
    C <- as.numeric(coef(m))
    ya <<- C[1]
    yb <<- C[2]
}
topright <- function() {
    message("click the top-right corner of plot box\n")
    xy <- locator(1)
    xout <<- xy$x
    yout <<- xy$y
}
data <- function(n = 100) {
    message("escape by clicking to right of or above top-right corner of box\n")
    x <- y <- NULL
    i <- 1
    while (TRUE) {
        xy <- locator(1)
        xx <- xa + xb * xy$x
        yy <- ya + yb * xy$y
        cat("i=", i, "xy:", xy$x, xy$y, "->", xx, yy, "\n")
        if (xy$x > xout || xy$y > yout) {
            return(list(x = x, y = y))
        }
        x <- c(x, xx)
        y <- c(y, yy)
        i <- i + 1
        if (i > n) 
            return(list(x = x, y = y))
    }
}
digitize <- function(image, xaxis, yaxis) {
    library(png)
    png <- readPNG(image)
    par(mar = rep(0, 4))
    plot(0:1, 0:1, type = "n")
    rasterImage(png[, , 1], 0, 0, 1, 1)
    xaxis(xaxis)
    yaxis(yaxis)
    topright()
    data()
}

Application

As a test of this, let’s create some fake data

1
2
3
4
5
set.seed(123)
x <- 1:10
y <- 1 + x + rnorm(10)
par(mar = c(3, 3, 1, 1), mgp = c(2, 0.7, 0))
plot(x, y, type = "o")

center

and digitize the resultant image (saved in a PNG file).

1
xy <- digitize("sample.png", c(2, 10), c(2, 10))

Results

When I did as above, clicking points without a great deal of care, I got an RMS error of a bit under 2 percent. It seems likely that more careful work could get this closer to 1 percent.

Conclusions

This method is perhaps slightly easier than hand-rolling new code for each instance of this task. It lacks some basic features, however. One nice addition would be the ability to remove data points. For that, perhaps the lower-left corner of the graph box could be determined with a function named bottomleft(), and the rule could be that clicking below that point or to its left would remove the most recent point. That’s an exercise for the reader. Possibly the next step would be to take the hour it would take to create a little package … although it seems likely that one already exists!

Resources

To leave a comment for the author, please follow the link and comment on his blog: Dan Kelley Blog/R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.