Sometimes I want to use R to fill in values that are missing in one data frame with values from another. For example, I have data from the World Bank on government deficits. However, there are some country-years with missing data. I gathered data from Eurostat on deficits and want to use this data to fill in some of the values that are missing from my World Bank data.
Doing this is kind of a pain so I created a function that would do it for me. It’s called
Here is an example using some fake data. (This example and part of the function was inspired by a Stack Exchange conversation between JD Long and Josh O’Brien.)
First let’s make two data frames: one with missing values in a variable called
fNA. And a data frame with a more complete variable called
# Create data set with missing values naDF <- data.frame(a = sample(c(1,2), 100, rep=TRUE), b = sample(c(3,4), 100, rep=TRUE), fNA = sample(c(100, 200, 300, 400, NA), 100, rep=TRUE)) # Created full data set fillDF <- data.frame(a = c(1,2,1,2), b = c(3,3,4,4), fFull = c(100, 200, 300, 400))
Now we just enter some information into
FillIn about what the data set names are, what variables we want to fill in, and what variables to join the data sets on.
# Fill in missing f's from naDF with values from fillDF FilledInData ##  "16 NAs were replaced." ##  "The correlation between fNA and fFull is 0.313"
Var1 are for the data frame and variables you want to fill in.
Var2 are what you want to use to fill them in with.
KeyVar specifies what variables you want to use to joint the two data frames.
FillIn lets you know how many missing values it is filling in and what the correlation coefficient is between the two variables you are using. Depending on your missing data issues, this could be an indicator of whether or not
Var2 is an appropriate substitute for
FillIn is currently available as a GitHub Gist and can be installed with this code:
You will need the devtools package to install it. For it to work properly you will also need the data.table package.