Stata-like Marginal Effects for Logit and Probit Models in R

May 17, 2011

(This article was first published on TimeSeriesIreland » R, and kindly contributed to R-bloggers)

Although this blog’s primary focus is time series, one feature I missed from Stata was the simple marginal effects command, ‘mfx compute’, for cross-sectional work, and I could not find an adequate replacement in R. To bridge this gap, I’ve written a (rather messy) R function to produce marginal effects readout for logit and probit models.

If we want to analyse the effect of a change in some explanatory variable, say x_j, on the probability that a binary dependent variable is equal to 1, simple calculus will show that the probability density function evaluated at the sample mean times the estimated coefficient of x_j will give us the marginal effect. More formally, let M_i = \sum_{k=1}^{} \beta_k x_{k,i} be the sum of coefficients times their respective explanatory variables that describe a binary dependent variable, such as employment or unemployment, etc., where x_{1,i} = 1. The ‘logit’ model is given by

\Lambda (M_i) = \frac{1}{1 + \exp(- M_i)}

where \Lambda (\cdot) is the cumulative distribution function of the logistic distribution with mean zero and s = 1. Similarly, the ‘probit’ model is given by

\Phi (M_i) = \displaystyle \int_{- \infty}^{M_i} \frac{1}{\sqrt{2 \pi}} \exp(- z^2/2) dz

the standard normal cumulative distribution function. Taking the derivative of either model with respect to the jth explanatory variable is a simple application of the chain rule; for the probit model:

\frac{\partial{} \Phi (\cdot)}{\partial{} M_i} \frac{\partial{} M_i}{\partial{} x_{j, i} }

\frac{\partial{} \Phi (\cdot)}{\partial{} M_i} = \phi (M_i), i.e., the standard normal probability density function, and \frac{\partial{} M_i}{\partial{} x_{j, i} } = \beta_{j}. We can use this to calculate the marginal effects from a glm object. The R code is below; all it requires is an estimated logit or probit model from the glm function. The code is a little messy, but it should work. Feel free to email me with any suggestions (see contact tab above).

For example, using a dataset provided by Jeff Wooldridge, MROZ.dta, we can compare results from Stata and R.

R code:

a<-read.dta("MROZ.dta",convert.factors = F)
logit1 <- glm(inlf ~ nwifeinc + educ + exper + expersq + age + kidslt6 + kidsge6, family=binomial(link="logit"), data=a)
probit1 <- glm(inlf ~ nwifeinc + educ + exper + expersq + age + kidslt6 + kidsge6, family=binomial(link="probit"), data=a)

Stata code and output:

R output:

‘mfx’ function:

	jkj <-bb[2:nrow(bb),]*bbmeans
	M3<-sum(jkj) + bb[1,]
	if (ll<-x$family$link=="probit"){
		probitmfx <- data.frame(mfx=dnorm(M3)*bb[2:nrow(bb),],row.names=bbh[2:nrow(bb)])
		else{probitmfx <- data.frame(mfx=dlogis(M3)*bb[2:nrow(bb),],row.names=bbh[2:nrow(bb)])}
	probitmfxfull <- data.frame(mfx=probitmfx,SE=mfxse,bbmeans,summary(x)$coef[2:nrow(bb),3],summary(x)$coef[2:nrow(bb),4],row.names=bbh[2:nrow(bb)])
	colnames(probitmfxfull) <- c("mfx","SE","Mean Value","z","Pr(>|z|)")
	logl <- 0.5*(-x$aic + 2*nrow(bb))
	#McFadden's R2
	depen <- x$data[,1]
	depenglm <- glm(depen ~ 1, family=binomial(link=x$family$link),data=x$data)
	logldepen <- 0.5*(-depenglm$aic + 2)
	psr2<- 1 - (logl/logldepen)
	sbc<- -2*logl + log(obs)*nrow(bb)
	HIC<- -2*logl + 2*log(log(obs))*nrow(bb)
		if (qq<-x$family$link=="logit"){
	#CDF of Mean Model
		if (ll == TRUE){
		else {BigProb<-plogis(M3)}
	#LR Test
	LRTest<- -2*(logl - logldepen)
	dfLR<-nrow(bb) - 1
	LRdata <- data.frame(LRTest, dfLR,LRp)
	colnames(LRdata) <- c("Test Statistic","DF","P-Value")
	rownames(LRdata) <- "LR Test"
	rownames(tests)<-c("CDF(Evaluated at the Mean):","Log-Likelihood:","McFadden's R2:","Akaike Information Criterion:","Hannan-Quinn Criterion:","Schwarz's Bayesian Criterion:")
	cat("MFX Function for Logit and Probit", "\n")
	cat("", "\n")
		if (qq<-x$family$link=="logit"){ cat("This is a Logit Model","\n")
		else if (qq<-x$family$link=="probit"){ cat("This is a Probit Model","\n")
		else {cat("","\n")}
	cat("", "\n")
	cat("Reporting Marginal Effects, Evaluated at the Mean", "\n")
	cat("", "\n")
	printCoefmat(probitmfxfull, P.value=TRUE, has.Pvalue=TRUE)
	cat("", "\n")
	cat("Observations:", obs, "\n")
	cat("", "\n")
	printCoefmat(tests, P.value=F, has.Pvalue=F)
	cat("", "\n")
	cat("Likelihood-Ratio Test:", "\n")
	printCoefmat(LRdata, P.value=T, has.Pvalue=T)
	cat("", "\n")

To leave a comment for the author, please follow the link and comment on their blog: TimeSeriesIreland » R. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , , ,

Comments are closed.


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)