Manipulate(d) Regression!

[This article was first published on R – Design Data Decisions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The R package ‘manipulate’ can be used to create interactive plots in RStudio. Though not as versatile as the ‘shiny’ package, ‘manipulate’ can be used to quickly add interactive elements to standard R plots. This can prove useful for demonstrating statistical concepts, especially to a non-statistician audience.

The R code at the end of this post uses the ‘manipulate’ package with a regression plot to illustrate the effect of outliers (and influential) points on the fitted linear regression model. The resulting manipulate(d) plot in RStudio includes a gear icon, which, when clicked, opens up a slider control. The slider can be used to move some data points. The plot changes interactively with the data.

Here are some static figures:

Initial state: It is possible to move two points in the scatter plot, one at the end and one at the center.Initial

An outlier at center has a limited influence on the fitted regression model.

MoveMidY

An outlier at the ends of support of x and y ‘moves’ the regression line towards it and is also an influential point!

MoveEndY

Here is the complete R code for generating the interactive plot. This is to be run in RStudio.

library(manipulate)

## First define a custom function that fits a linear regression line 
## to (x,y) points and overlays the regression line in a scatterplot.
## The plot is then 'manipulated' to change as y values change.

linregIllustrate <- function(x, y, e, h.max, h.med){
  max.x <- max(x)
  med.x <- median(x)
  max.xind <- which(x == max.x)
  med.xind <- which(x == med.x)

  y1 <- y     ## Modified y
  y1[max.xind] <- y1[max.xind]+h.max  ## at the end
  y1[med.xind] <- y1[med.xind]+h.med  ## at the center
  plot(x, y1, xlim=c(min(x),max(x)+5), ylim=c(min(y1),max(y1)), pch=16, 
       xlab="X", ylab="Y")
  text(x[max.xind], y1[max.xind],"I'm movable!", pos=3, offset = 0.3, cex=0.7, font=2, col="red")
  text(x[med.xind], y1[med.xind],"I'm movable too!", pos=3, offset = 0.3, cex=0.7, font=2, col="red")
  
  m <- lm(y ~ x)  ## Regression with original set of points, the black line
  abline(m, lwd=2)

  m1 <- lm(y1 ~ x)  ## Regression with modified y, the dashed red line
  abline(m1, col="red", lwd=2, lty=2)
}

## Now generate some x and y data 
x <- rnorm(35,10,5)
e <- rnorm(35,0,5)
y <- 3*x+5+e

## Plot and manipulate the plot!
manipulate(linregIllustrate(x, y, e, h.max, h.med), 
           h.max=slider(-100, 100, initial=0, step=10, label="Move y at the end"), 
           h.med=slider(-100, 100, initial=0, step=10, label="Move y at the center"))

To leave a comment for the author, please follow the link and comment on their blog: R – Design Data Decisions.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)