Manipulate(d) Regression!

May 5, 2016
By

(This article was first published on R – Design Data Decisions, and kindly contributed to R-bloggers)

The R package ‘manipulate’ can be used to create interactive plots in RStudio. Though not as versatile as the ‘shiny’ package, ‘manipulate’ can be used to quickly add interactive elements to standard R plots. This can prove useful for demonstrating statistical concepts, especially to a non-statistician audience.

The R code at the end of this post uses the ‘manipulate’ package with a regression plot to illustrate the effect of outliers (and influential) points on the fitted linear regression model. The resulting manipulate(d) plot in RStudio includes a gear icon, which, when clicked, opens up a slider control. The slider can be used to move some data points. The plot changes interactively with the data.

Here are some static figures:

Initial state: It is possible to move two points in the scatter plot, one at the end and one at the center. An outlier at center has a limited influence on the fitted regression model. An outlier at the ends of support of x and y ‘moves’ the regression line towards it and is also an influential point! Here is the complete R code for generating the interactive plot. This is to be run in RStudio.

library(manipulate)

## First define a custom function that fits a linear regression line
## to (x,y) points and overlays the regression line in a scatterplot.
## The plot is then 'manipulated' to change as y values change.

linregIllustrate <- function(x, y, e, h.max, h.med){
max.x <- max(x)
med.x <- median(x)
max.xind <- which(x == max.x)
med.xind <- which(x == med.x)

y1 <- y     ## Modified y
y1[max.xind] <- y1[max.xind]+h.max  ## at the end
y1[med.xind] <- y1[med.xind]+h.med  ## at the center
plot(x, y1, xlim=c(min(x),max(x)+5), ylim=c(min(y1),max(y1)), pch=16,
xlab="X", ylab="Y")
text(x[max.xind], y1[max.xind],"I'm movable!", pos=3, offset = 0.3, cex=0.7, font=2, col="red")
text(x[med.xind], y1[med.xind],"I'm movable too!", pos=3, offset = 0.3, cex=0.7, font=2, col="red")

m <- lm(y ~ x)  ## Regression with original set of points, the black line
abline(m, lwd=2)

m1 <- lm(y1 ~ x)  ## Regression with modified y, the dashed red line
abline(m1, col="red", lwd=2, lty=2)
}

## Now generate some x and y data
x <- rnorm(35,10,5)
e <- rnorm(35,0,5)
y <- 3*x+5+e

## Plot and manipulate the plot!
manipulate(linregIllustrate(x, y, e, h.max, h.med),
h.max=slider(-100, 100, initial=0, step=10, label="Move y at the end"),
h.med=slider(-100, 100, initial=0, step=10, label="Move y at the center"))  To leave a comment for the author, please follow the link and comment on their blog: R – Design Data Decisions.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...