Stata-like Marginal Effects for Logit and Probit Models in R

May 17, 2011
By

(This article was first published on TimeSeriesIreland » R, and kindly contributed to R-bloggers)

Although this blog’s primary focus is time series, one feature I missed from Stata was the simple marginal effects command, ‘mfx compute’, for cross-sectional work, and I could not find an adequate replacement in R. To bridge this gap, I’ve written a (rather messy) R function to produce marginal effects readout for logit and probit models.

If we want to analyse the effect of a change in some explanatory variable, say x_j, on the probability that a binary dependent variable is equal to 1, simple calculus will show that the probability density function evaluated at the sample mean times the estimated coefficient of x_j will give us the marginal effect. More formally, let M_i = \sum_{k=1}^{} \beta_k x_{k,i} be the sum of coefficients times their respective explanatory variables that describe a binary dependent variable, such as employment or unemployment, etc., where x_{1,i} = 1. The ‘logit’ model is given by

\Lambda (M_i) = \frac{1}{1 + \exp(- M_i)}

where \Lambda (\cdot) is the cumulative distribution function of the logistic distribution with mean zero and s = 1. Similarly, the ‘probit’ model is given by

\Phi (M_i) = \displaystyle \int_{- \infty}^{M_i} \frac{1}{\sqrt{2 \pi}} \exp(- z^2/2) dz

the standard normal cumulative distribution function. Taking the derivative of either model with respect to the jth explanatory variable is a simple application of the chain rule; for the probit model:

\frac{\partial{} \Phi (\cdot)}{\partial{} M_i} \frac{\partial{} M_i}{\partial{} x_{j, i} }

\frac{\partial{} \Phi (\cdot)}{\partial{} M_i} = \phi (M_i), i.e., the standard normal probability density function, and \frac{\partial{} M_i}{\partial{} x_{j, i} } = \beta_{j}. We can use this to calculate the marginal effects from a glm object. The R code is below; all it requires is an estimated logit or probit model from the glm function. The code is a little messy, but it should work. Feel free to email me with any suggestions (see contact tab above).

For example, using a dataset provided by Jeff Wooldridge, MROZ.dta, we can compare results from Stata and R.

R code:

require(foreign)
#
a<-read.dta("MROZ.dta",convert.factors = F)
#
logit1 <- glm(inlf ~ nwifeinc + educ + exper + expersq + age + kidslt6 + kidsge6, family=binomial(link="logit"), data=a)
probit1 <- glm(inlf ~ nwifeinc + educ + exper + expersq + age + kidslt6 + kidsge6, family=binomial(link="probit"), data=a)
#
mfx(logit1)
mfx(probit1)
#

Stata code and output:

R output:

‘mfx’ function:

mfx<-function(x){
	bb<-data.frame(x$coefficients)
	bb2<-data.frame(mean(x$data,na.rm=T))
	bbh<-as.character(rownames(bb))
	bbmeans<-bb2[bbh[2:nrow(bb)],]
	jkj <-bb[2:nrow(bb),]*bbmeans
	#
	M3<-sum(jkj) + bb[1,]
	#
	if (ll<-x$family$link=="probit"){
		probitmfx <- data.frame(mfx=dnorm(M3)*bb[2:nrow(bb),],row.names=bbh[2:nrow(bb)])
	}
		else{probitmfx <- data.frame(mfx=dlogis(M3)*bb[2:nrow(bb),],row.names=bbh[2:nrow(bb)])}
	#
	bbse<-bb/summary(x)$coef[,2]
	mfxse<-probitmfx/bbse[2:nrow(bb),]
	#
	probitmfxfull <- data.frame(mfx=probitmfx,SE=mfxse,bbmeans,summary(x)$coef[2:nrow(bb),3],summary(x)$coef[2:nrow(bb),4],row.names=bbh[2:nrow(bb)])
	colnames(probitmfxfull) <- c("mfx","SE","Mean Value","z","Pr(>|z|)")
	#
	logl <- 0.5*(-x$aic + 2*nrow(bb))
	#McFadden's R2
	depen <- x$data[,1]
	depenglm <- glm(depen ~ 1, family=binomial(link=x$family$link),data=x$data)
	logldepen <- 0.5*(-depenglm$aic + 2)
	psr2<- 1 - (logl/logldepen)
	#
	obs<-nrow(x$data)
	#SBC/BIC
	sbc<- -2*logl + log(obs)*nrow(bb)
	#HQIC
	HIC<- -2*logl + 2*log(log(obs))*nrow(bb)
	#Logit?
		if (qq<-x$family$link=="logit"){
		}
		else{}
	#CDF of Mean Model
		if (ll == TRUE){
		BigProb<-pnorm(M3)
		}
		else {BigProb<-plogis(M3)}
	#LR Test
	LRTest<- -2*(logl - logldepen)
	dfLR<-nrow(bb) - 1
	LRp<-dchisq(LRTest,df=dfLR)
	LRdata <- data.frame(LRTest, dfLR,LRp)
	colnames(LRdata) <- c("Test Statistic","DF","P-Value")
	rownames(LRdata) <- "LR Test"
	#
	tests<-rbind(BigProb,logl,psr2,x$aic,HIC,sbc)
	tests<-data.frame(tests)
	colnames(tests)<-""
	rownames(tests)<-c("CDF(Evaluated at the Mean):","Log-Likelihood:","McFadden's R2:","Akaike Information Criterion:","Hannan-Quinn Criterion:","Schwarz's Bayesian Criterion:")
	#
	cat("MFX Function for Logit and Probit", "\n")
	cat("", "\n")
		if (qq<-x$family$link=="logit"){ cat("This is a Logit Model","\n")
		}
		else if (qq<-x$family$link=="probit"){ cat("This is a Probit Model","\n")
		}
		else {cat("","\n")}
	cat("", "\n")
	cat("Reporting Marginal Effects, Evaluated at the Mean", "\n")
	cat("", "\n")
	printCoefmat(probitmfxfull, P.value=TRUE, has.Pvalue=TRUE)
	cat("", "\n")
	cat("Observations:", obs, "\n")
	cat("", "\n")
	printCoefmat(tests, P.value=F, has.Pvalue=F)
	cat("", "\n")
	cat("Likelihood-Ratio Test:", "\n")
	printCoefmat(LRdata, P.value=T, has.Pvalue=T)
	cat("", "\n")
}

To leave a comment for the author, please follow the link and comment on his blog: TimeSeriesIreland » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , , ,

Comments are closed.