**The Geokook. » R**, and kindly contributed to R-bloggers)

When I was considering submitting my paper on psd to J. Stat. Soft. (JSS), I kept noticing that the time from “Submitted” to “Accepted” was nearly two years in many cases. I ultimately decided that was much too long of a review process, no matter what the impact factor might be (*and in two years time, would I even care?).* Tonight I had the sudden urge to put together a dataset of times to publication.

Fortunately the JSS website is structured such that it only took a few minutes playing with XML scraping (*shudder*) to write the (R) code to reproduce the full dataset. I then ran a changepoint (published in JSS!) analysis to see when shifts in mean time have occurred. Here are the results:

Pretty interesting stuff, but kind of depressing: the *average* time it takes to publish is about 1.5 years, with a standard deviation of 206 days. There are many cases where the paper review is <1 year, but those tend to be in the ‘past’ (prior to volume 45, issue 1).

Of course, these results largely reflect an increase in academic impact (JSS is becoming more impactful), which simultaneously increases the number of submissions for the editors to deal with. So, these data should be normalized by *something. *By what, exactly, I don’t know.

And, finally, I can’t imagine how the authors of the paper that went through a 1400+ day review process felt — or are they still feeling the sting?

Here’s my session info:

R version 3.1.0 (2014-04-10) Platform: x86_64-apple-darwin13.1.0 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] changepoint_1.1.5 zoo_1.7-11 plyr_1.8.1 XML_3.98-1.1 loaded via a namespace (and not attached): [1] grid_3.1.0 lattice_0.20-29 Rcpp_0.11.1 tools_3.1.0

And here's the R-code needed to reproduce the dataset and figure:

library(XML) library(plyr) library(changepoint) #Current Volume: cvol <- 58 # set to 'TRUE' if you want to # reproduce the dataset with each # run (not recommended) redo <- FALSE jstat.xml <- function(vol=1, tolist=TRUE){ src <- "http://www.jstatsoft.org/" vsrc <- sprintf("%sv%i",src,vol) message(vsrc) X <- xmlParse(vsrc) if (tolist) X <- xmlToList(X) return(X) } jstat.DTP <- function(vol=1, no.authors=FALSE, no.title=FALSE){ # Get article data xl <- jstat.xml(vol) Artic <- xl$body[[4]]$div$ul # article data # Vol,Issue issues <- ldply(Artic, function(x) return(x[[5]][[1]]))$V1 issues2 <- ldply(strsplit(issues,split=","), .fun=function(x){gsub("\n Vol. ","",gsub("Issue ","",x))}) # Accepted dates <- ldply(Artic, function(x) return(x[[6]][[1]]))$V1 dates2 <- ldply(strsplit(dates, split=","), .fun=function(x){as.Date(gsub("Accepted ","",gsub("Submitted ","",x)))}) Dat <- data.frame(Volume=issues2$V1, Issue=issues2$V2, Date=issues2$V3, Submitted=dates2$V1, Accepted=dates2$V2, Days.to.pub=as.numeric(difftime(dates2$V2, dates2$V1, units="days")), Author=NA, Title=NA) if (!no.authors){ # Authors Dat$Author <- ldply(Artic, function(x) return(x[[3]][[1]]))$V1 } if (!no.title){ # Title Dat$Title <- ldply(Artic, function(x) return(x[[1]][[1]]))$V1 } return(Dat) } # Shakedown #jstat.DTP(58) #ldply(57:58, jstat.DTP) if (!exists("Alldata") | redo){ Alldata <- ldply(seq_len(cvol), jstat.DTP) save(Alldata, file="JStatSoft_DtP.rda") } else { load("JStatSoft_DtP.rda") } Alldata.s <- arrange(Alldata, Days.to.pub) Cpt <- suppressWarnings(cpt.mean(Alldata$Days.to.pub, method="SegNeigh", Q=4)) niss <- length([email protected]) summary(Cpt) sd(Alldata$Days.to.pub) PLT <- function(){ layout(matrix(1:3)) par(las=1, cex=0.8) with(Alldata, { par(mar=c(0.5,5,2,1)) plot(Days.to.pub, xlim=c(0,niss), #xaxs="i", type="l", col="grey", ylab="Days", xaxt="n", ylim=c(0,1500),yaxs="i") points(Days.to.pub, pch=3) mtext("Time to 'Accepted': J. Stat. Soft.", font=2, line=0.5) axis(1, labels=FALSE) par(mar=c(2,5,0.,1)) plot(log2(Days.to.pub), xlim=c(0,niss), #xaxs="i", pch=3, ylab="log2 Days") yt <- log2(356*c(1/12,1:2)) abline(h=yt, col="red", lty=2) }) par(mar=c(4,5,1,1)) plot(Cpt, xlim=c(0,niss), #xaxs="i", xlab="Issue index", ylab="Days", ylim=c(-499,1500),yaxs="i") mtext("Changes in mean", font=2, line=-2, adj=0.1, cex=0.8) Dm <- lbls <- round(param.est(Cpt)$mean) lbls[1] <- paste("section mean:",lbls[1]) Lc <- cpts(Cpt) yt <- -75-max(Dm)+Dm xt <- c(0,Lc)+diff(c(0,Lc,niss))/2 text(xt, yt, lbls, col="red", cex=0.9) } PLT()

**leave a comment**for the author, please follow the link and comment on their blog:

**The Geokook. » R**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...