Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

#### Introduction

Todays mission have been to visualize publication frequencies over the years for different psychotherapies using PubMed and R. I choose to concentrate on Psychodynamic Therapy (PDT), Cognitive Behavior Therapy (CBT), Acceptance and Commitment Therapy (ACT) and Mindfulness. I know there’s is a R function to query PubMed, but I didn’t use it and instead searched PubMed manually and saved the result as four separate csv-files.

PubMed search filter
I used a really rudimentary search filter:

• “cognitive behavior therapy” OR “cognitive therapy” (13245 hits)
• “psychodynamic therapy” OR “psychoanalytic therapy” (13933 hits)
• Mindfulness (1099 hits)
• “Acceptance and commitment therapy” (374 hits)
• #### Writing the R code

So, after I’d saved the data files from PubMed to my computer, I switched over the R and loaded the necessary packages.

library(stringr) # used for str_extract()
library(ggplot2) # used for plotting
library(plyr) # used for count()


Then I wrote a short function to extract year and frequency from the csv-file. Since publication year wasn’t under it’s own header I used str_extract to extract that information.

# create function named scrape.pubmed
scrape.pubmed <- function() {
# we only need column 5
results <- pubmed_result[5]
# the header is repeated regularly, so I remove those rows.
results <- results[results\$Details != "ShortDetails",]
# The remaning field contain journal name in addition to pub. year
# The general expression catches values 1900--1999 OR 2000--2019
# Most rows had year information at the end of the string,
# but not all rows had this. Therefore I used this regular expression to
# minimize mismatches.
year <- str_extract(results, "(19[0-9]{2}|20[0-1][0-9])")
# Use the count()-function to group years together.
# I didn't want values from 2012, hence the < 2012 addition.
count <- count(year[year < 2012])
# Return the value of count
return(count)
}


Now that I have a function to collect my data, I loaded the different csv-files I saved from PubMed and collect year data from them.

# Get CBT data
cbt.count <- scrape.pubmed()

# Get PDT data
pdt.count <- scrape.pubmed()

# Get mindfulness data
mindfulness.count <- scrape.pubmed()

# Get ACT data
act.count <- scrape.pubmed()


Now I have 4 variables who look like this:

> head(cbt.count, 10)
x freq
1  1976   19
2  1977    2
3  1978    1
4  1979    1
5  1980   12
6  1981    9
7  1982   12
8  1983   15
9  1984   14
10 1985   16


I could combine them into one data.frame but I choose not to. Instead I plot them individually on different layers with ggplot.

# It doesn't matter in which order you enter the variables, I
# started with cbt.count. For some reason ggplot wouldn't accept
# the syntax without 'group = 1'.
ggplot(cbt.count, aes(x, freq, group = 1, color = "CBT")) +
# line layer for cbt.count
geom_line() +
# new layer for pdt.count
geom_line(data = pdt.count, aes(x, freq, color = "PDT")) +
# new layer for mindfulness.count
geom_line(data = mindfulness.count, aes(x, freq, color = "Mindfulness")) +
# new layer for act.clunt
geom_line(data = act.count, aes(x, freq, color = "ACT"))+
# This line both control the groups colors and labels on the legend.
scale_color_manual("Orientation",
values= c("CBT" = "blue",
"PDT" = "red",
"Mindfulness" = "green",
"ACT" = "magenta")) +
ylab("Publications") + # y-label
xlab("Year") + # x-label
# break up x-axis from 1900 to 2010 by ticks of 10
scale_x_discrete(breaks = seq(1900,2010, 10)) +
opts(title = "Pubmed citations by year")


#### Results

We can see that PDT really had a decline in the early 90′s and the CBT started it’s rise and that it really sky rocketed in the 21th century. PDT also had an increase in the 21th century but it got a heck of a lot of work to do before it can match CBT’s publication output. The plot also show that mindfulness had a surge and that it’s output now is on par with PDT.

I’ve made some updates to this code in a more recent post:
Ggplot2, PubMed citation frequency and DSM-IV Axis I disorders by year