Probit/Logit Marginal Effects in R

[This article was first published on DiffusePrioR » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The common approach to estimating a binary dependent variable regression model is to use either the logit or probit model. Both are forms of generalized linear models (GLMs), which can be seen as modified linear regressions that allow the dependent variable to originate from non-normal distributions.

The coefficients in a linear regression model are marginal effects, meaning that they can be treated as partial derivatives. This makes the linear regression model very easy to interpret. For example, the fitted linear regression model y=x*b tells us that a one unit increase in x increases y by b units.

Unfortunately, this is not the case in GLMs, because fitted GLMs take the form, y=G (x*b), where G(.) is the known link function (i.e. inverse logistic for logit). Since we want to calculate the slope of x which is inside the function G(.), calculating marginal effects that are comparable to their linear model counterparts involves using the chain rule.

Empirical economic research typically cites the marginal effects since they are intuitive and easy to digest. Therefore, it is important for researchers to be able to compute these results. A more formal treatment of this problem can be found in the following paper, where we see that the solution is to multiply the estimated GLM coefficients by the probability density function of the linked distribution (which is the derivative of the cumulative density function). Interestingly, the linked paper also supplies some R code which calculates marginal effects for both the probit or logit models. In the code below, I demonstrate a similar function that calculates ‘the average of the sample marginal effects’.

mfxboot <- function(modform,dist,data,boot=1000,digits=3){
  x <- glm(modform, family=binomial(link=dist),data)
  # get marginal effects
  pdf <- ifelse(dist=="probit",
                mean(dnorm(predict(x, type = "link"))),
                mean(dlogis(predict(x, type = "link"))))
  marginal.effects <- pdf*coef(x)
  # start bootstrap
  bootvals <- matrix(rep(NA,boot*length(coef(x))), nrow=boot)
  set.seed(1111)
  for(i in 1:boot){
    samp1 <- data[sample(1:dim(data)[1],replace=T,dim(data)[1]),]
    x1 <- glm(modform, family=binomial(link=dist),samp1)
    pdf1 <- ifelse(dist=="probit",
                   mean(dnorm(predict(x, type = "link"))),
                   mean(dlogis(predict(x, type = "link"))))
    bootvals[i,] <- pdf1*coef(x1)
  }
  res <- cbind(marginal.effects,apply(bootvals,2,sd),marginal.effects/apply(bootvals,2,sd))
  if(names(x$coefficients[1])=="(Intercept)"){
    res1 <- res[2:nrow(res),]
    res2 <- matrix(as.numeric(sprintf(paste("%.",paste(digits,"f",sep=""),sep=""),res1)),nrow=dim(res1)[1])     
    rownames(res2) <- rownames(res1)
    } else {
    res2 <- matrix(as.numeric(sprintf(paste("%.",paste(digits,"f",sep=""),sep="")),nrow=dim(res)[1]))
    rownames(res2) <- rownames(res)
    }
  colnames(res2) <- c("marginal.effect","standard.error","z.ratio")  
  return(res2)
}

This command also provides bootstrapped standard errors, which account for both the uncertainty in the predicted values and the estimated coefficients. The R package ‘erer’ also has a function that calculates these marginal effects. Illustrations of the above function, alongside code for a nice ggplot2 figure are displayed below.

library(AER)
data(SwissLabor)
mfx1 <- mfxboot(participation ~ . + I(age^2),"probit",SwissLabor)
mfx2 <- mfxboot(participation ~ . + I(age^2),"logit",SwissLabor)
mfx3 <- mfxboot(participation ~ . + I(age^2),"probit",SwissLabor,boot=100,digits=4)

mfxdat <- data.frame(cbind(rownames(mfx1),mfx1))
mfxdat$me <- as.numeric(as.character(mfxdat$marginal.effect))
mfxdat$se <- as.numeric(as.character(mfxdat$standard.error))

# coefplot
library(ggplot2)
ggplot(mfxdat, aes(V1, marginal.effect,ymin = me - 2*se,ymax= me + 2*se)) +
  scale_x_discrete('Variable') +
  scale_y_continuous('Marginal Effect',limits=c(-0.5,1)) +
  theme_bw() + 
  geom_errorbar(aes(x = V1, y = me),size=.3,width=.2) + 
  geom_point(aes(x = V1, y = me)) +
  geom_hline(yintercept=0) + 
  coord_flip() +
  opts(title="Marginal Effects with 95% Confidence Intervals")


To leave a comment for the author, please follow the link and comment on their blog: DiffusePrioR » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)