# Probit/Logit Marginal Effects in R

April 23, 2012
By

(This article was first published on DiffusePrioR » R, and kindly contributed to R-bloggers)

The common approach to estimating a binary dependent variable regression model is to use either the logit or probit model. Both are forms of generalized linear models (GLMs), which can be seen as modified linear regressions that allow the dependent variable to originate from non-normal distributions.

The coefficients in a linear regression model are marginal effects, meaning that they can be treated as partial derivatives. This makes the linear regression model very easy to interpret. For example, the fitted linear regression model y=x*b tells us that a one unit increase in x increases y by b units.

Unfortunately, this is not the case in GLMs, because fitted GLMs take the form, y=G (x*b), where G(.) is the known link function (i.e. inverse logistic for logit). Since we want to calculate the slope of x which is inside the function G(.), calculating marginal effects that are comparable to their linear model counterparts involves using the chain rule.

Empirical economic research typically cites the marginal effects since they are intuitive and easy to digest. Therefore, it is important for researchers to be able to compute these results. A more formal treatment of this problem can be found in the following paper, where we see that the solution is to multiply the estimated GLM coefficients by the probability density function of the linked distribution (which is the derivative of the cumulative density function). Interestingly, the linked paper also supplies some R code which calculates marginal effects for both the probit or logit models. In the code below, I demonstrate a similar function that calculates ‘the average of the sample marginal effects’.

mfxboot <- function(modform,dist,data,boot=1000,digits=3){
# get marginal effects
pdf <- ifelse(dist=="probit",
marginal.effects <- pdf*coef(x)
# start bootstrap
bootvals <- matrix(rep(NA,boot*length(coef(x))), nrow=boot)
set.seed(1111)
for(i in 1:boot){
samp1 <- data[sample(1:dim(data)[1],replace=T,dim(data)[1]),]
pdf1 <- ifelse(dist=="probit",
bootvals[i,] <- pdf1*coef(x1)
}
res <- cbind(marginal.effects,apply(bootvals,2,sd),marginal.effects/apply(bootvals,2,sd))
if(names(x$coefficients[1])=="(Intercept)"){ res1 <- res[2:nrow(res),] res2 <- matrix(as.numeric(sprintf(paste("%.",paste(digits,"f",sep=""),sep=""),res1)),nrow=dim(res1)[1]) rownames(res2) <- rownames(res1) } else { res2 <- matrix(as.numeric(sprintf(paste("%.",paste(digits,"f",sep=""),sep="")),nrow=dim(res)[1])) rownames(res2) <- rownames(res) } colnames(res2) <- c("marginal.effect","standard.error","z.ratio") return(res2) }  This command also provides bootstrapped standard errors, which account for both the uncertainty in the predicted values and the estimated coefficients. The R package ‘erer’ also has a function that calculates these marginal effects. Illustrations of the above function, alongside code for a nice ggplot2 figure are displayed below. library(AER) data(SwissLabor) mfx1 <- mfxboot(participation ~ . + I(age^2),"probit",SwissLabor) mfx2 <- mfxboot(participation ~ . + I(age^2),"logit",SwissLabor) mfx3 <- mfxboot(participation ~ . + I(age^2),"probit",SwissLabor,boot=100,digits=4) mfxdat <- data.frame(cbind(rownames(mfx1),mfx1)) mfxdat$me <- as.numeric(as.character(mfxdat$marginal.effect)) mfxdat$se <- as.numeric(as.character(mfxdat\$standard.error))

# coefplot
library(ggplot2)
ggplot(mfxdat, aes(V1, marginal.effect,ymin = me - 2*se,ymax= me + 2*se)) +
scale_x_discrete('Variable') +
scale_y_continuous('Marginal Effect',limits=c(-0.5,1)) +
theme_bw() +
geom_errorbar(aes(x = V1, y = me),size=.3,width=.2) +
geom_point(aes(x = V1, y = me)) +
geom_hline(yintercept=0) +
coord_flip() +
opts(title="Marginal Effects with 95% Confidence Intervals")


R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...