Demystifying the GLM (Part 1)

February 11, 2016

(This article was first published on some real numbers, and kindly contributed to R-bloggers)

Upon being thrown a prickly binary classification problem, most data practitioners will have dug deep into their statistical tool box and pulled out the trusty logistic regression model.

Essentially, logistic regression can help us predict a binary (yes/no) response with consideration given to other, hopefully related, variables. For example, one might want to predict whether a person will experience a heart attack given their weight and age. In this case, we have reason to believe weight and age are related to the incidence of heart attacks.

So, they will have sorted their data, fired up R and typed something along the lines of:

glm(heartAttack ~ weight + age, data = heartData, family=binomial())

But what is a glm? What does family = binomial() actually mean?

It turns out the logistic regression model is a member of a broad group of models known as generalised linear models, or GLMs for short.

This series will endeavor to help demystify these highly useful models.

Stay tuned.

To leave a comment for the author, please follow the link and comment on their blog: some real numbers. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)