Merge a list of datasets together

Last week I showed how to read a lot of datasets at once with R, and this week I’ll continue from there and show a very simple function that uses this list of read datasets and merges them all together. First we’ll use read_list() to read all the datasets at once (for more details read last week’s post): library("readr") library("tibble") data_files

Read a lot of datasets at once with R

I often have to read a lot of datasets at once using R. So I’ve wrote the following function to solve this issue: read_list

Data frame columns as arguments to dplyr functions

Suppose that you would like to create a function which does a series of computations on a data frame. You would like to pass a column as this function’s argument. Something like: data(cars) convertToKmh % summarise(mean_speed = mean(speed)) -__ dataset return(dataset) } simpleFunction(cars, "dist") A tibble: 35 x 2 dist mean_speed 1 ...

Careful with tryCatch

tryCatch is one of the functions that allows the users to handle errors in a simple way. With it, you can do things like: if(error), then(do this). Take the following example: sqrt("a") Error in sqrt("a") : non-numeric argument to mathematical function Now maybe you’d want something to happen when such an error happens. You can achieve that with tryCatch: tryCatch(sqrt("a"), error=function(e) print("You can't take...

Unit testing with R

I've been introduced to unit testing while working with colleagues on quite a big project for which we use Python. At first I was a bit skeptical about the need of writing unit tests, but now I must admit that I am seduced by the idea and by the huge time savings it allows. Naturally, I was wondering if the same...

Bootstrapping standard errors for difference-in-differences estimation with R

November 10, 2015
I’m currently working on a paper (with my colleague Vincent Vergnat who is also a Phd candidate at BETA) where I want to estimate the causal impact of the birth of a child on hourly and daily wages as well as yearly worked hours. For this we are using non-parametric difference-in-differences (henceforth DiD) and thus have to bootstrap the...

