**SAS and R**, and kindly contributed to R-bloggers)

We begin the new academic year with a series of entries exploring new capabilities of SAS 9.3, and some functionality we haven’t previously written about.

We’ll begin with multiple imputation. Here, SAS has previously been limited to multivariate normal data or to monotonic missing data patterns.

**SAS**

SAS 9.3 adds the `FCS` statement to `proc mi`. This implements a fully conditional specification imputation method (e.g., van Buuren, S. (2007), “Multiple Imputation of Discrete and Continuous Data by Fully Conditional Specification,” Statistical Methods in Medical Research, 16, 219–242.) Briefly, we begin by imputing all the missing data with a simple method. Then missing values for each variable are imputed using a model created with the real and current imputed values for the other variables, iterating across the variables several times.

We replicate the multiple imputation example from the book, section 6.5. In that example, we used the `mcmc` statement for imputation: at the time, this was the only method available in SAS when a non-monotonic missingness pattern was present. We noted at the time that this was not “strictly appropriate” since `mcmc` method assumes multivariate normality, and two of our missing variables were dichotomous.

filename myhm url "http://www.math.smith.edu/sasr/datasets/helpmiss.csv" lrecl=704;

proc import replace datafile=myhm out=help dbms=dlm;

delimiter=',';

getnames=yes;

run;

proc mi data = help nimpute=20 out=helpmi20fcs;

class homeless female;

var i1 homeless female sexrisk indtot mcs pcs;

fcs

logistic (female)

logistic (homeless);

run;

In the `fcs` statement, you list the method (`logistic, discrim, reg, regpmm`) to be used, naming the variable for which the method is to be used in parentheses following the method. (You can also specify a subset of covariates to be used in the method, using the usual SAS model-building syntax.) Omitted covariates are imputed using the default `reg` method.

ods output parameterestimates=helpmipefcs

covb = helpmicovbfcs;

proc logistic data=helpmi20fcs descending;

by _imputation_;

model homeless=female i1 sexrisk indtot /covb;

run;

proc mianalyze parms=helpmipefcs covb=helpmicovbfcs;

modeleffects intercept female i1 sexrisk indtot;

run;

with the following primary result:

Parameter Estimate Std Error 95% Conf. Limits

intercept -2.492733 0.591241 -3.65157 -1.33390

female -0.245103 0.244029 -0.72339 0.23319

i1 0.023207 0.005610 0.01221 0.03420

sexrisk 0.058642 0.035803 -0.01153 0.12882

indtot 0.047971 0.015745 0.01711 0.07883

which is quite similar to our previous results. Given the small proportion of missing values, this isn’t very surprising.

**R**

Several R packages allow imputation for a general pattern of missingness and missing outcome distribution. A brief summary of missing data tools in R can be found in the CRAN Task view on Multivariate Statistics. We’ll return to this topic from the R perspective in a future entry.

**leave a comment**for the author, please follow the link and comment on their blog:

**SAS and R**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...