Finding Waldo, a flag on the moon and multiple choice tests, with R
Want to share your content on Rbloggers? click here if you have a blog, or here if you don't.
I have to admit, first, that finding Waldo has been a difficult task. And I did not succeed. Neither could I correctly spot his shirt (because actually, it was what I was looking for). You know, that redandwhite striped shirt. I guess it should have been possible to look for Waldo’s face (assuming that his face does not change) but I still have problems with size factor (and resolution issues too). The problem is not that simple. At the http://mlsp2009.conwiz.dk/ conference, a price was offered for writing an algorithm in Matlab. And one can even find Mathematica codes online. But most of the those algorithms are based on the idea that we look for similarities with Waldo’s face, as described in problem 3 on http://www1.cs.columbia.edu/~blake/‘s webpage. You can find papers on that problem, e.g. Friencly & Kwan (2009) (based on statistical techniques, but Waldo is here a pretext to discuss other issues actually), or more recently (but more complex) Garg et al. (2011) on matching people in images of crowds.
What about codes in R ? On http://stackoverflow.com/, some ideas can be found (and thank Robert Hijmans for his help on his package). So let us try here to do something, on our own. Consider the following picture,
With the following code (based on the following file) it is possible to import the picture, and to extract the colors (based on an RGB decomposition),> library(raster) > waldo=brick(system.file("DepartmentStoreW.grd", + package="raster")) > waldo class : RasterBrick dimensions : 768, 1024, 786432, 3 (nrow,ncol,ncell,nlayer) resolution : 1, 1 (x, y) extent : 0, 1024, 0, 768 (xmin, xmax, ymin, ymax) coord. ref. : NA values : C:\R\winlibrary\raster\DepartmentStoreW.grd min values : 0 0 0 max values : 255 255 255My strategy is simple: try to spot areas with white and red stripes (horizontal stripes). Note that here, I ran the code on a Windows machine, the package is not working well on Mac. In order to get a better understanding of what could be done, let us start with something much more simple. Like the picture below, with Waldo (and Waldo only). Here, it is possible to extract the three colors (red, green and blue),
> plot(waldo,useRaster=FALSE)

# white component white = min(waldo[[1]] , waldo[[2]] , waldo[[3]])>220 focalswhite = focal(white, w=3, fun=mean) plot(focalswhite,useRaster=FALSE) # red component red = (waldo[[1]]>150)&(max( waldo[[2]] , waldo[[3]])<90) focalsred = focal(red, w=3, fun=mean) plot(focalsred,useRaster=FALSE)i.e. here we have the graphs below, with the white regions, and the red ones,
# striped component striped = red; n=length(values(striped)); h=5 values(striped)=0 values(striped)[(h+1):(nh)]=(values(red)[1:(n2*h)]== TRUE)&(values(red)[(2*h+1):n]==TRUE) focalsstriped = focal(striped, w=3, fun=mean) plot(focalsstriped,useRaster=FALSE)So here, we can easily spot Waldo, i.e. the guy with the redwhite stripes (with two different sets of thresholds for the RGB decomposition)
exam = stack("C:\\Users\\examblank.png") red = (exam[[1]]>150)&(max( exam[[2]] , exam[[3]])<150) focalsred = focal(red, w=3, fun=mean) plot(focalsred,useRaster=FALSE) exam = stack("C:\\Users\\examfilled.png") red = (exam[[1]]>150)&(max( exam[[2]] , exam[[3]])<150) focalsred = focal(red, w=3, fun=mean) plot(focalsred,useRaster=FALSE)
First, we have to identify areas where students have to fill the blanks. So in the template, identify black boxes, and get the coordinates (here manually)
exam = stack("C:\\Users\\examblank.png") black = max( exam[[1]] ,exam[[2]] , exam[[3]])<50 focalsblack = focal(black, w=3, fun=mean) plot(focalsblack,useRaster=FALSE) correct=locator(20) coordinates=locator(20) X1=c(73,115,156,199,239) X2=c(386,428.9,471,510,554) Y=c(601,536,470,405,341,276,210,145,79,15) LISTX=c(rep(X1,each=10),rep(X2,each=10)) LISTY=rep(Y,10) points(LISTX,LISTY,pch=16,col="blue")
CORRECTX=c(X1[c(2,4,1,3,1,1,4,5,2,2)], X2[c(2,3,4,2,1,1,1,2,5,5)]) CORRECTY=c(Y,Y) points(CORRECTX, CORRECTY,pch=16,col="red",cex=1.3) UNCORRECTX=c(X1[rep(1:5,10)[(c(2,4,1,3,1,1,4,5,2,2) +seq(0,length=10,by=5))]], X2[rep(1:5,10)[(c(2,3,4,2,1,1,1,2,5,5) +seq(0,length=10,by=5))]]) UNCORRECTY=c(rep(Y,each=4),rep(Y,each=4))Now, let us get back on red areas in the form filled by the student, identified earlier,
exam = stack("C:\\Users\\examfilled.png") red = (exam[[1]]>150)&(max( exam[[2]] , exam[[3]])<150) focalsred = focal(red, w=5, fun=mean)
ind=which(values(focalsred)>.3) yind=750trunc(ind/610) xind=indtrunc(ind/610)*610 points(xind,yind,pch=19,cex=.4,col="blue") points(CORRECTX, CORRECTY,pch=1, col="red",cex=1.5,lwd=1.5)Crosses on the graph on the right below are the answers identified as correct (here 13),
> icorrect=values(red)[(750CORRECTY)* + 610+(CORRECTX)] > points(CORRECTX[icorrect], CORRECTY[icorrect], + pch=4,col="black",cex=1.5,lwd=1.5) > sum(icorrect) [1] 13
> iuncorrect=values(red)[(750UNCORRECTY)*610+ + (UNCORRECTX)] > sum(iuncorrect) [1] 4So I have not been able to find Waldo, but I least, that will probably save me hours next time I have to mark exams...
Rbloggers.com offers daily email updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/datascience job.
Want to share your content on Rbloggers? click here if you have a blog, or here if you don't.