[This article was first published on Freakonometrics - Tag - R-english, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Following my post on citations in academic journals, I wanted to go one step further in the understanding of the dynamic of citations. So here, the dataset looks like that: for each article, we have the name of the journal, the year of publication (also the title of the article, but here we do not use it, as well as the authors), and more interesting, the number of citations in journals (any kind of academic journal) published in 1996, 1997, …, 2011. Of course, articles published in 1999 might have their first citation only starting in 1999.
base[1000:1002,]
Publication.Year
7188 1999
7191 1999
7195 1999
Document.Title
7188 Sequential inspection
7191 On equitable resource approach
7195 Method for strategic
Authors ISSN Journal.Title
7188 Yao D.D., Zheng S. 0030364X Operations Research
7191 Luss H. 0030364X Operations Research
7195 Seshadri S., Khanna A., Harche F., Wyle R. 0030364X Operations Research
Volume Issue X139 DEV1996 DEV1997 DEV1998 DEV1999 DEV2000 DEV2001 DEV2002
7188 47 3 0 0 0 0 0 1 0 2
7191 47 3 0 0 0 0 0 0 2 0
7195 47 3 0 0 0 0 0 0 0 0
DEV2003 DEV2004 DEV2005 DEV2006 DEV2007 DEV2008 DEV2009 DEV2010 DEV2011
7188 0 0 0 1 0 0 0 0 0
7191 3 4 1 4 4 8 4 6 1
7195 0 1 2 2 1 0 1 0 0
X130655 X0 X130794
7188 4 0 4
7191 37 0 37
7195 7 0 7
The first step is to aggregate data, not to look at each article, but
to look at all paper published in 1999 (say). And then, we look at the
number in citations the year of publication, the year after, two years
after, etc. It will appear in a triangle since if we look at articles
published in 2010, there is only on possible year for citations (2010,
since I removed 2011).VOL=rev(unique(base$Volume))
VOL=VOL[is.na(VOL)==FALSE]
TRIANGLE=matrix(NA,16,16)
for(v in VOL){
k=k+1
sb=base[base$Volume==v,9:24]
sb=sb[is.na(sb[,1])==FALSE,]
TRIANGLE[k,1:(17-k)]=apply(sb,2,sum)[k:16]}
Then, a standard idea (at least in insurance business, for claims
payment development) is to consider that data are Poisson distributed,
and the number of citations should depend on the year of publication of
the article (a row effect) and the development (how many years after
are we looking at, i.e. a column effect). More formally, let TRIANGLE=TRIANGLE[-16,] TRIANGLE=TRIANGLE[,-16] Y=as.vector(TRIANGLE) YEAR=rep(1996:2010,15) DEV =rep(1:15,each=15) baseT=data.frame(Y,YEAR,DEV) reg=glm(Y~as.factor(YEAR)+as.factor(DEV), data=baseT,family=poisson)Since those are incremental values, in order to look at the paper of distribution, we need to sum them on a line. Thus, we can plot
(because we used factors, the first component has been replaced by the constant in the regression) or a normalized version to compare among journals. For instance, we would like to get 100 citations over 15 years.
DYN=exp(c(reg$coefficients[1],reg$coefficients[1]+ reg$coefficients[16:29])) DYNN=cumsum(DYN)/sum(DYN) plot(0:15,DYNN)And this is what we get, for several academic journals,
Now it is possible to look more into details, with below JRSS-B (on applied statistics). Note that here, citations come extremely slowly… to it might not be a good “strategy” (assuming that a researcher’s target is simply to get – quickly – a high citation index) for a young researcher to publish in JRSS-B
and Stochastic Processes and their Applications
Anyway, all suggestions about the interpretation are welcomed !
To leave a comment for the author, please follow the link and comment on their blog: Freakonometrics - Tag - R-english.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
