Deductive imputation with the deducorrect package

November 26, 2011
By

(This article was first published on Mark van der Loo, and kindly contributed to R-bloggers)

Missing data hinders statistical analyses. Estimating missing values (imputation) prior to analysis is one way to deal with that. In some cases however, the missings need not be estimated at all, since they can be derived with certainty from other data which is present. The latest version of our package deducorrect can do this for numerical as well as for categorical data.

As an example, consider a record with three fields x, y and z, subject to the rules

x + y = z
 (x,y,z)\geq 0

If we’re given a record with values (x=1,y=NA,z=4), the value for y can be easily derived, right? Right. You don’t have to be a mathematician to impute y=3 here. Now consider (x=4,y=NA,z=1). We get y=-3, but since this violates the positivity rule above, this is not a valid imputation. The deduIpute function of our package can take this into account. Below is a short R-session, showing how to deductively impute with the deducorrect package.

?Download download.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
> library(deducorrect)
Loading required package: editrules
> # define the rules
> E <- editmatrix(c(
+     "x + y == z",
+     "x >= 0", "y>=0", "z>=0"
+     )
+ )
> # some data: 
> (dat <- data.frame(x=c(1,4),y=c(NA,NA),z=c(4,1)))
  x  y z
1 1 NA 4
2 4 NA 1
> 
# And now for the magic step: (deduImpute returns a 
# 'deducorrect' object)
> imp <- deduImpute(E,dat)
 
> # the imputed data
> imp$corrected
  x  y z
1 1  3 4
2 4 NA 1
 
# a list of imputations performed
> imp$corrections
  row variable old new
1   1        y  NA   3

The deduImpute function only imputes what can be imputed consistently, taking all (in)equality rules into account. Some of the lower-level (record-by-record) functionality is exported as well, and as said before, it also works for categorical data.

There’s a lot more to say about deductive imputation. If you’re interested in the mathematical background or want to see more examples, please read our paper which is included as the package vignette. Don’t hesitate to drop us a line with comments, suggestions or if you find a little insect =:O.

To leave a comment for the author, please follow the link and comment on their blog: Mark van der Loo.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Mango solutions



RStudio homepage



Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training

datasociety

http://www.eoda.de





ODSC

ODSC

CRC R books series





Six Sigma Online Training









Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)