**SAS and R**, and kindly contributed to R-bloggers)

As we discuss in section 6.1.4 of the second edition, R and SAS handle categorical variables and their parameterization in models quite differently. SAS treats them on a procedure-by-procedure basis, which leads to some odd differences in capabilities and default parameterizations. For example, in the `logistic` procedure, the default is effect cell coding, while in the `genmod` procedure– which also fits logistic regression– the default is reference cell coding. Meanwhile, many procedures can only accommodate reference cell coding.

In R, in contrast, categorical variables can be designated as “factors” and parameterization stored an attribute of the factor.

In section 6.1.4, we demonstrate how the parameterization of a factor can be easily changed on the fly, in R, in `lm()`,`glm()`, and `aov`, using the `contrasts=` option in those functions. Here we show how to set the attribute more generally, for use in functions that don’t accept the option. This post was inspired by a question from Julia Kuder, of Brigham and Women’s Hospital.

**SAS**

We begin by simulating censored survival data as in Example 7.30. We’ll also export the data to use in R.

data simcox;

beta1 = 2;

lambdat = 0.002; *baseline hazard;

lambdac = 0.004; *censoring hazard;

do i = 1 to 10000;

x1 = rantbl(0, .25, .25,.25);

linpred = exp(-beta1*(x1 eq 4));

t = rand("WEIBULL", 1, lambdaT * linpred);

* time of event;

c = rand("WEIBULL", 1, lambdaC);

* time of censoring;

time = min(t, c); * which came first?;

censored = (c lt t);

output;

end;

run;

proc export data=simcox replace

outfile="c:/temp/simcox.csv"

dbms=csv;

run;

Now we’ll fit the data in SAS, using effect coding.

proc phreg data=simcox;

class x1 (param=effect);

model time*censored(0)= x1 ;

run;

We reproduce the rather unexciting results here for comparison with R.

Parameter Standard

Parameter DF Estimate Error

x1 1 1 -0.02698 0.03471

x1 2 1 -0.01211 0.03437

x1 3 1 -0.05940 0.03458

**R**

In R we read the data in, then use the `C()` function to assign the `contr.sum` contrast to a version of the `x1` variable that we save as a factor. Once that is done, we can fit the proportional hazards regression with the desired contrast.

simcox<- read.csv("c:/temp/simcox.csv")

sc2 = transform(simcox, x1.eff = C(as.factor(x1), contr.sum(4)))

effmodel <- coxph(Surv(time, censored)~ x1.eff,data= sc2)

summary(effmodel)

We excerpt the relevant output to demonstrate equivalence with SAS.

coef exp(coef) se(coef)

x1.eff1 -0.02698 0.97339 0.03471

x1.eff2 -0.01211 0.98797 0.03437

x1.eff3 -0.05940 0.94233 0.03458

**An unrelated note about aggregators:**We love aggregators! Aggregators collect blogs that have similar coverage for the convenience of readers, and for blog authors they offer a way to reach new audiences. SAS and R is aggregated by R-bloggers, PROC-X, and statsblogs with our permission, and by at least 2 other aggregating services which have never contacted us. If you read this on an aggregator that does not credit the blogs it incorporates, please come visit us at SAS and R. We answer comments there and offer direct subscriptions if you like our content. In addition, no one is allowed to profit by this work under our license; if you see advertisements on this page, the aggregator is violating the terms by which we publish our work.

**leave a comment**for the author, please follow the link and comment on their blog:

**SAS and R**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...