# Stata-like Marginal Effects for Logit and Probit Models in R

May 17, 2011
By

(This article was first published on TimeSeriesIreland » R, and kindly contributed to R-bloggers)

Although this blog’s primary focus is time series, one feature I missed from Stata was the simple marginal effects command, ‘mfx compute’, for cross-sectional work, and I could not find an adequate replacement in R. To bridge this gap, I’ve written a (rather messy) R function to produce marginal effects readout for logit and probit models.

If we want to analyse the effect of a change in some explanatory variable, say $x_j$, on the probability that a binary dependent variable is equal to 1, simple calculus will show that the probability density function evaluated at the sample mean times the estimated coefficient of $x_j$ will give us the marginal effect. More formally, let $M_i = \sum_{k=1}^{} \beta_k x_{k,i}$ be the sum of coefficients times their respective explanatory variables that describe a binary dependent variable, such as employment or unemployment, etc., where $x_{1,i} = 1$. The ‘logit’ model is given by

$\Lambda (M_i) = \frac{1}{1 + \exp(- M_i)}$

where $\Lambda (\cdot)$ is the cumulative distribution function of the logistic distribution with mean zero and $s = 1$. Similarly, the ‘probit’ model is given by

$\Phi (M_i) = \displaystyle \int_{- \infty}^{M_i} \frac{1}{\sqrt{2 \pi}} \exp(- z^2/2) dz$

the standard normal cumulative distribution function. Taking the derivative of either model with respect to the jth explanatory variable is a simple application of the chain rule; for the probit model:

$\frac{\partial{} \Phi (\cdot)}{\partial{} M_i} \frac{\partial{} M_i}{\partial{} x_{j, i} }$

$\frac{\partial{} \Phi (\cdot)}{\partial{} M_i} = \phi (M_i)$, i.e., the standard normal probability density function, and $\frac{\partial{} M_i}{\partial{} x_{j, i} } = \beta_{j}$. We can use this to calculate the marginal effects from a glm object. The R code is below; all it requires is an estimated logit or probit model from the glm function. The code is a little messy, but it should work. Feel free to email me with any suggestions (see contact tab above).

For example, using a dataset provided by Jeff Wooldridge, MROZ.dta, we can compare results from Stata and R.

R code:

require(foreign)
#
#
logit1 <- glm(inlf ~ nwifeinc + educ + exper + expersq + age + kidslt6 + kidsge6, family=binomial(link="logit"), data=a)
probit1 <- glm(inlf ~ nwifeinc + educ + exper + expersq + age + kidslt6 + kidsge6, family=binomial(link="probit"), data=a)
#
mfx(logit1)
mfx(probit1)
#


Stata code and output:

R output:

‘mfx’ function:

mfx<-function(x){
bb<-data.frame(x$coefficients) bb2<-data.frame(mean(x$data,na.rm=T))
bbh<-as.character(rownames(bb))
bbmeans<-bb2[bbh[2:nrow(bb)],]
jkj <-bb[2:nrow(bb),]*bbmeans
#
M3<-sum(jkj) + bb[1,]
#
if (ll<-x$family$link=="probit"){
probitmfx <- data.frame(mfx=dnorm(M3)*bb[2:nrow(bb),],row.names=bbh[2:nrow(bb)])
}
else{probitmfx <- data.frame(mfx=dlogis(M3)*bb[2:nrow(bb),],row.names=bbh[2:nrow(bb)])}
#
bbse<-bb/summary(x)$coef[,2] mfxse<-probitmfx/bbse[2:nrow(bb),] # probitmfxfull <- data.frame(mfx=probitmfx,SE=mfxse,bbmeans,summary(x)$coef[2:nrow(bb),3],summary(x)$coef[2:nrow(bb),4],row.names=bbh[2:nrow(bb)]) colnames(probitmfxfull) <- c("mfx","SE","Mean Value","z","Pr(>|z|)") # logl <- 0.5*(-x$aic + 2*nrow(bb))
depen <- x$data[,1] depenglm <- glm(depen ~ 1, family=binomial(link=x$family$link),data=x$data)
logldepen <- 0.5*(-depenglm$aic + 2) psr2<- 1 - (logl/logldepen) # obs<-nrow(x$data)
#SBC/BIC
sbc<- -2*logl + log(obs)*nrow(bb)
#HQIC
HIC<- -2*logl + 2*log(log(obs))*nrow(bb)
#Logit?
if (qq<-x$family$link=="logit"){
}
else{}
#CDF of Mean Model
if (ll == TRUE){
BigProb<-pnorm(M3)
}
else {BigProb<-plogis(M3)}
#LR Test
LRTest<- -2*(logl - logldepen)
dfLR<-nrow(bb) - 1
LRp<-dchisq(LRTest,df=dfLR)
LRdata <- data.frame(LRTest, dfLR,LRp)
colnames(LRdata) <- c("Test Statistic","DF","P-Value")
rownames(LRdata) <- "LR Test"
#
tests<-rbind(BigProb,logl,psr2,x$aic,HIC,sbc) tests<-data.frame(tests) colnames(tests)<-"" rownames(tests)<-c("CDF(Evaluated at the Mean):","Log-Likelihood:","McFadden's R2:","Akaike Information Criterion:","Hannan-Quinn Criterion:","Schwarz's Bayesian Criterion:") # cat("MFX Function for Logit and Probit", "\n") cat("", "\n") if (qq<-x$family$link=="logit"){ cat("This is a Logit Model","\n") } else if (qq<-x$family\$link=="probit"){ cat("This is a Probit Model","\n")
}
else {cat("","\n")}
cat("", "\n")
cat("Reporting Marginal Effects, Evaluated at the Mean", "\n")
cat("", "\n")
printCoefmat(probitmfxfull, P.value=TRUE, has.Pvalue=TRUE)
cat("", "\n")
cat("Observations:", obs, "\n")
cat("", "\n")
printCoefmat(tests, P.value=F, has.Pvalue=F)
cat("", "\n")
cat("Likelihood-Ratio Test:", "\n")
printCoefmat(LRdata, P.value=T, has.Pvalue=T)
cat("", "\n")
}


R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...