# When can we expect the last damn microarray paper?

January 18, 2012
By

### With bonus R code

It came as a shock to learn from PubMed that almost 900 papers were published with the word "microarray" in their titles last year alone, just 12 shy of the 2010 count. More alarming, many of these papers were not of the innocuous "Microarray study of gene expression in dog scrotal tissue" variety, but dry rehashings along the lines of "Statistical approaches to normalizing microarrays to the reference brightness of Ursa Minor".

It's an ugly truth we must face: people aren't just using microarrays, they're still writing about them.

See for yourself:

getCount<-function(term){function(year){  nihUrl<-concat("http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=",term,"+",year,"[pdat]")  #cleanurl<-gsub('\\]','%5D',gsub('\\[','%5B',x=url))  #http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=microarray%5btitle%5d+2003%5bpdat%5d  xml<-xmlTreeParse(URLencode(nihUrl),isURL=TRUE)  #Data Mashups in R, pg17  as.numeric(xmlValue(xml$doc$children$eSearchResult$children$Count$children\$text))}}years<-1995:2011df<-data.frame(type="obs",year=years,    mic=sapply(years,function(x){do.call(getCount('microarray[title]'),list(x))}),    ngs=sapply(years,function(x){do.call(getCount('"next generation sequencing"[title] OR "high-throughput sequencing"[title]'),list(x))}))#papers with "microarray" in title> df[,c("year","mic")]   year  mic1  1995    22  1996    43  1997    04  1998    75  1999   286  2000  1087  2001  2738  2002  5539  2003  77010 2004 103211 2005 113512 2006 121613 2007 110714 2008 105515 2009  98116 2010  90917 2011  897
Reading another treatise on microarray normalization in 2012 would be just tragic. Who still reads these? Who still writes these papers? Can we stop them? If not, when can we expect NGS to wipe them off the map?
#97 is a fair startdf<-subset(df,year>=1997)mdf<-melt(df,id.vars=c("type","year"),variable_name="citation")c<-ggplot(mdf,aes(x=year))p<-c+geom_point(aes(y=value,color=citation)) +  ylab("papers") +  stat_smooth(aes(y=value,color=citation),data=subset(mdf,citation=="mic"),method="loess") +  scale_x_continuous(breaks=seq(from=1997,to=2011,by=2))print(p)
Here I plot both microarray and next-generation sequencing papers (in title). We see kurtosis is working in our favor, and LOESS seems to agree!
But when will the pain end? Let us extrapolate, wildly.
#Return 0 for negative elements# noNeg(c(3,2,1,0,-1,-2,2))# [1] 3 2 1 0 0 0 2noNeg<-function(v){sapply(v,function(x){max(x,0)})}#Return up to the first negative/zero element inclusive# toZeroNoNeg(c(3,2,1,0,-1,-2,2))# [1] 3 2 1 0toZeroNoNeg<-function(v){noNeg(v)[1:firstZero(noNeg(v))]}#return index of first zerofirstZero<-function(v){which(noNeg(v)==0)[1]}#let's peer into the futuredf.lo.mic<-loess(mic ~ year,df,control=loess.control(surface="direct"))#when will it stop?mic_predict<-as.integer(predict(df.lo.mic,data.frame(year=2012:2020),se=FALSE))zero_year<-2011+firstZero(mic_predict)cat(concat("LOESS projects ",sum(toZeroNoNeg(mic_predict))," more microarray papers."))cat(concat("The last damn microarray paper is projected to be in ",(zero_year-1),"."))#predict ngs growthdf.lo.ngs<-loess(ngs ~ year,df,control=loess.control(surface="direct"))ngs_predict<-as.integer(predict(df.lo.ngs,data.frame(year=2012:zero_year),se=FALSE))pred_df<-data.frame(type="pred",year=c(2012:zero_year),mic=toZeroNoNeg(mic_predict),ngs=ngs_predict)df2<-rbind(df,pred_df)mdf2<-melt(df2,id.vars=c("type","year"),variable_name="citation")c2<-ggplot(mdf2,aes(x=year))p2<-c2+geom_point(aes(y=value,color=citation,shape=type),size=3) +    ylab("papers") +    scale_y_continuous(breaks=seq(from=0,to=1600,by=200))+    scale_x_continuous(breaks=seq(from=1997,to=zero_year,by=2))print(p2)

LOESS projects 2038 more microarray papers.
The last damn microarray paper is projected to be published in 2016.

Yeah, right...

Full R code here: https://gist.github.com/1637248