Manipulate(d) Regression!

May 5, 2016
By

(This article was first published on R – Design Data Decisions, and kindly contributed to R-bloggers)

The R package ‘manipulate’ can be used to create interactive plots in RStudio. Though not as versatile as the ‘shiny’ package, ‘manipulate’ can be used to quickly add interactive elements to standard R plots. This can prove useful for demonstrating statistical concepts, especially to a non-statistician audience.

The R code at the end of this post uses the ‘manipulate’ package with a regression plot to illustrate the effect of outliers (and influential) points on the fitted linear regression model. The resulting manipulate(d) plot in RStudio includes a gear icon, which, when clicked, opens up a slider control. The slider can be used to move some data points. The plot changes interactively with the data.

Here are some static figures:

Initial state: It is possible to move two points in the scatter plot, one at the end and one at the center.Initial

An outlier at center has a limited influence on the fitted regression model.

MoveMidY

An outlier at the ends of support of x and y ‘moves’ the regression line towards it and is also an influential point!

MoveEndY

Here is the complete R code for generating the interactive plot. This is to be run in RStudio.

library(manipulate)

## First define a custom function that fits a linear regression line 
## to (x,y) points and overlays the regression line in a scatterplot.
## The plot is then 'manipulated' to change as y values change.

linregIllustrate <- function(x, y, e, h.max, h.med){
  max.x <- max(x)
  med.x <- median(x)
  max.xind <- which(x == max.x)
  med.xind <- which(x == med.x)

  y1 <- y     ## Modified y
  y1[max.xind] <- y1[max.xind]+h.max  ## at the end
  y1[med.xind] <- y1[med.xind]+h.med  ## at the center
  plot(x, y1, xlim=c(min(x),max(x)+5), ylim=c(min(y1),max(y1)), pch=16, 
       xlab="X", ylab="Y")
  text(x[max.xind], y1[max.xind],"I'm movable!", pos=3, offset = 0.3, cex=0.7, font=2, col="red")
  text(x[med.xind], y1[med.xind],"I'm movable too!", pos=3, offset = 0.3, cex=0.7, font=2, col="red")
  
  m <- lm(y ~ x)  ## Regression with original set of points, the black line
  abline(m, lwd=2)

  m1 <- lm(y1 ~ x)  ## Regression with modified y, the dashed red line
  abline(m1, col="red", lwd=2, lty=2)
}

## Now generate some x and y data 
x <- rnorm(35,10,5)
e <- rnorm(35,0,5)
y <- 3*x+5+e

## Plot and manipulate the plot!
manipulate(linregIllustrate(x, y, e, h.max, h.med), 
           h.max=slider(-100, 100, initial=0, step=10, label="Move y at the end"), 
           h.med=slider(-100, 100, initial=0, step=10, label="Move y at the center"))

To leave a comment for the author, please follow the link and comment on their blog: R – Design Data Decisions.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)