Finding Economic Articles with Data (2nd Update)

[This article was first published on Economics and R - R posts, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Almost a year is now gone since I posted my last update about my shiny-powered search app. It allows to search among currently more than 5000 economic articles that have an accessible data and code supplement:

The main data for my app can be downloaded as a zipped SQLite database from my server. Let us do some analysis.

db = dbConnect(RSQLite::SQLite(),"articles.sqlite") %>%
  set.db.schemas(schema.file=system.file("schema/articles.yaml", package="EconJournalData"))

articles = dbGet(db,"article")
fs = dbGet(db,"files_summary")

Let us look grouped by journal at the share of articles whose code supplement has R files:

fs %>% 
  left_join(select(articles, id, journ), by="id") %>%
  group_by(journ) %>%
  mutate(num_art = n_distinct(id)) %>%
  filter(file_type=="r") %>%
    num_art = first(num_art),
    num_with_r = n(),
    share_with_r=round((num_with_r / first(num_art))*100,2)
  ) %>%

We see that there is quite some variation in the share of articles with R code going from 13.2% in Econometrica (ecta) to only 0.74% in the Review of Economics and Statistics (restat). (The statistics exclude all articles that don’t have a code supplement or a supplement whose file types I did not analyse, e.g. because it is too large or the ZIP files are nested too deeply.)

Overall, we still have a clear dominance of Stata in economics:

# Number of articles with analyes data & code supplementary
n_art = n_distinct(fs$id)

# Count articles by file types and compute shares
fs %>% group_by(file_type) %>%
  summarize(count = n(), share=round((count / n_art)*100,2)) %>%
  # note that all file extensions are stored in lower case
  filter(file_type %in% c("do","r","py","jl","m")) %>%

Roughly 70% of the articles have Stata do files and a quarter Matlab m files and only 3.6% R files.

While R, Python and Julia increased their share over recent years, it seems not like a very strong trend yet.

sum_dat = fs %>% 
  left_join(select(articles, year, id), by="id") %>%
  group_by(year) %>%
  mutate(n_art_year = n()) %>%
  group_by(year, file_type) %>%
    count = n(),
    share=round((count / first(n_art_year))*100,2)
  ) %>%
  filter(file_type %in% c("do","r","py","jl","m")) %>%

ggplot(sum_dat, aes(x=year, y=share, color=file_type)) +
  geom_line(size=1.5) + scale_y_log10() + theme_bw()

I also have a log file that anonymously stores data about which articles that have been clicked on. The code below shows the 20 most clicked on articles so far:

dat = read.csv("article_click.csv")

dat %>%
  group_by(article) %>%
  summarize(count=n()) %>%
  na.omit %>%
  arrange(desc(count)) %>%

## # A tibble: 2,707 x 2
##    article                                                                 count
##    <fct>                                                                   <int>
##  1 Consumer Spending during Unemployment: Positive and Normative Implicat~    50
##  2 Do Expert Reviews Affect the Demand for Wine?                              44
##  3 Tax Evasion and Inequality                                                 38
##  4 A Macroeconomic Model of Price Swings in the Housing Market                35
##  5 Is Your Lawyer a Lemon? Incentives and Selection in the Public Provisi~    33
##  6 The Welfare Effects of Social Media                                        31
##  7 The Rise of Market Power and the Macroeconomic Implications                29
##  8 Carbon Taxes and CO2 Emissions: Sweden as a Case Study                     27
##  9 Public Debt and Low Interest Rates                                         27
## 10 The Sad Truth about Happiness Scales                                       25
## 11 Job Polarization and Jobless Recoveries                                    24
## 12 The New Tools of Monetary Policy                                           24
## 13 Alcohol and Self-Control: A Field Experiment in India                      23
## 14 Disease and Gender Gaps in Human Capital Investment: Evidence from Nig~    23
## 15 Some Causal Effects of an Industrial Policy                                23
## 16 Food Deserts and the Causes of Nutritional Inequality                      22
## 17 Minimum Wage and Real Wage Inequality: Evidence from Pass-Through to R~    22
## 18 The Cost of Reducing Greenhouse Gas Emissions                              22
## 19 Adaptation to Climate Change: Evidence from US Agriculture                 21
## 20 Do Parents Value School Effectiveness?                                     21
## # ... with 2,687 more rows

So far there were over 11000 thousand clicks in total. Well, that is almost twice as much as the average number of Google searches in 100 milliseconds 😉

To leave a comment for the author, please follow the link and comment on their blog: Economics and R - R posts. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)