When can we expect the last damn microarray paper?

Posted on January 18, 2012 by Jeremy Leipzig in R bloggers | 0 Comments

[This article was first published on Jermdemo Raised to the Law, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

With bonus R code

It came as a shock to learn from PubMed that almost 900 papers were published with the word “microarray” in their titles last year alone, just 12 shy of the 2010 count. More alarming, many of these papers were not of the innocuous “Microarray study of gene expression in dog scrotal tissue” variety, but dry rehashings along the lines of “Statistical approaches to normalizing microarrays to the reference brightness of Ursa Minor”.

It’s an ugly truth we must face: people aren’t just using microarrays, they’re still writing about them.

See for yourself:

getCount<-function(term){function(year){
  nihUrl<-concat("http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=",term,"+",year,"[pdat]")
  #cleanurl<-gsub('\\]','%5D',gsub('\\[','%5B',x=url))
  #http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=microarray%5btitle%5d+2003%5bpdat%5d
  xml<-xmlTreeParse(URLencode(nihUrl),isURL=TRUE)
  #Data Mashups in R, pg17
  as.numeric(xmlValue(xml$doc$children$eSearchResult$children$Count$children$text))
}}

years<-1995:2011
df<-data.frame(type="obs",year=years,
    mic=sapply(years,function(x){do.call(getCount('microarray[title]'),list(x))}),
    ngs=sapply(years,function(x){do.call(getCount('"next generation sequencing"[title] OR "high-throughput sequencing"[title]'),list(x))})
)
#papers with "microarray" in title
> df[,c("year","mic")]
   year  mic
1  1995    2
2  1996    4
3  1997    0
4  1998    7
5  1999   28
6  2000  108
7  2001  273
8  2002  553
9  2003  770
10 2004 1032
11 2005 1135
12 2006 1216
13 2007 1107
14 2008 1055
15 2009  981
16 2010  909
17 2011  897

Reading another treatise on microarray normalization in 2012 would be just tragic. Who still reads these? Who still writes these papers? Can we stop them? If not, when can we expect NGS to wipe them off the map?

#97 is a fair start
df<-subset(df,year>=1997)
mdf<-melt(df,id.vars=c("type","year"),variable_name="citation")

c<-ggplot(mdf,aes(x=year))
p<-c+geom_point(aes(y=value,color=citation)) +
  ylab("papers") +
  stat_smooth(aes(y=value,color=citation),data=subset(mdf,citation=="mic"),method="loess") +
  scale_x_continuous(breaks=seq(from=1997,to=2011,by=2))
print(p)

Here I plot both microarray and next-generation sequencing papers (in title). We see kurtosis is working in our favor, and LOESS seems to agree!

But when will the pain end? Let us extrapolate, wildly.

#Return 0 for negative elements
# noNeg(c(3,2,1,0,-1,-2,2))
# [1] 3 2 1 0 0 0 2
noNeg<-function(v){sapply(v,function(x){max(x,0)})}

#Return up to the first negative/zero element inclusive
# toZeroNoNeg(c(3,2,1,0,-1,-2,2))
# [1] 3 2 1 0
toZeroNoNeg<-function(v){noNeg(v)[1:firstZero(noNeg(v))]}

#return index of first zero
firstZero<-function(v){which(noNeg(v)==0)[1]}

#let's peer into the future
df.lo.mic<-loess(mic ~ year,df,control=loess.control(surface="direct"))

#when will it stop?
mic_predict<-as.integer(predict(df.lo.mic,data.frame(year=2012:2020),se=FALSE))
zero_year<-2011+firstZero(mic_predict)
cat(concat("LOESS projects ",sum(toZeroNoNeg(mic_predict))," more microarray papers."))
cat(concat("The last damn microarray paper is projected to be in ",(zero_year-1),"."))

#predict ngs growth
df.lo.ngs<-loess(ngs ~ year,df,control=loess.control(surface="direct"))
ngs_predict<-as.integer(predict(df.lo.ngs,data.frame(year=2012:zero_year),se=FALSE))

pred_df<-data.frame(type="pred",year=c(2012:zero_year),mic=toZeroNoNeg(mic_predict),ngs=ngs_predict)
df2<-rbind(df,pred_df)

mdf2<-melt(df2,id.vars=c("type","year"),variable_name="citation")

c2<-ggplot(mdf2,aes(x=year))
p2<-c2+geom_point(aes(y=value,color=citation,shape=type),size=3) +
    ylab("papers") +
    scale_y_continuous(breaks=seq(from=0,to=1600,by=200))+
    scale_x_continuous(breaks=seq(from=1997,to=zero_year,by=2))
print(p2)

LOESS projects 2038 more microarray papers.
The last damn microarray paper is projected to be published in 2016.

Yeah, right...

Full R code here: https://gist.github.com/1637248

To leave a comment for the author, please follow the link and comment on their blog: Jermdemo Raised to the Law.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

When can we expect the last damn microarray paper?

With bonus R code

Related

With bonus R code

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)