Regularization implementation in R : Bais and Variance diagnosis

May 22, 2014
By

(This article was first published on Pingax » R, and kindly contributed to R-bloggers)

Welcome to this blog post. In previous posts I discussed about the linear regression and logistic regression in detail. We used Andrew NG’s ML class dataset to fit linear regression and logistic regression. We also discussed about step by step implementation in R along with cost function and gradient descent. In this post I will discuss about two major concept of supervised learning: Regularization and Bias-Variance diagnosis and its implementation. So let’s get started.

I have divided post into two sections:

  1. Concept
  2. Implementation

In first section we will understand regularization and bias-variance diagnosis and in second section we will discuss on R implementation.

1. Concept

I would like to recommend you to view the Coursera ML  class of regularization to understand the concept more in depth. Before we discuss about the regularization, let’s understand under fitting and over fitting first (if you know this then skip this section and jump to next section of implementation). So whenever we use polynomial function or large set of features into fitting model, model will over fit on the data. If we are using linear function or fewer features set, then model will under fit on the data.  In both cases, model will not be able to generalize for new data and prediction error will be large, so we require right fit on the data. This is shown in below image.

underfitting-overfitting

If model is under fitted then it is bias problem, if over fitted then it is variance problem.  It becomes very much crucial to understand whether our predictive model is suffering from bias problem or variance problem during the modelling process. This require to perform bias variance diagnosis by dividing dataset into three parts 1) training 2)testing 3) cross validation and analyzing prediction error on these three parts. We will later discuss more in detail.

If we are over fitting model then, it requires penalizing theta parameters in order to make just right fit. This will lead to use regularization in model fitting. So, regularization is technique to avoid over fitting. Regularization will penalize the theta parameters in the cost function. Now cost function will be defined as below

regularizataion

Where λ is regularization parameter, it decides degree of regularization. If λ is set to zero then there is no regularization and we need to set right value of lambda. In order to decide the right value of the λ, we need to use cross validation set.

Whenever we are performing bias – variance diagnosis, it becomes more important to understand characteristics of the bias problem and variance problem. These characteristics can be understood by learning curves and that help us to understand about specific criteria of bias and variance problem. One of the learning curves is shown below.

learning_curve_example

Now, we have understood little bit about regularization, bias-variance and learning curve. Let’s move ahead towards the implementation of regularization and learning curve using simple linear regression model.

2. Implementation

We will use dataset which is provided in courser ML class assignment for regularization. We will implement regularized linear regression to predict amount of water flowing out of dam using the change in the water level.

Let us first load the dataset into R (download dataset)

#Load library
library(R.matlab)

#Read dataset from the matlab file
data <- readMat("ex5data1.mat")

Andrew Ng always advice to visualize dataset first. So we will begin by visualizing dataset containing water level, x, and amount of water flowing out of the dam, y.

#Plot training data
plot(data$X,data$y,xlab="Change in water level (X)",ylab="water flowing out of dam (y)")

plot_1

Now let us implement regularized cost function which is shown above

#Predictor variables
X <- as.matrix(data$X)

#Add ones to X
X <- cbind(rep(1,nrow(X)),X)

#Response variable
y <- as.matrix(data$y)

#Cost Function
cost <- function(X,y,theta,lambda)
{
  m <- nrow(X)
  residual <- (X%*%theta) - y
  J <- (1/(2*m))*sum((residual)^2)+(lambda/(2*m))* sum((theta[2:length(theta)])^2)
  return(J)
}

Let’s check regularized cost with theta parameters values set to ones

#initial theta
theta <- rep(1,ncol(X))

#inital lambda
lambda <- 1

#cost at inital theta
cost(X,y,theta,lambda)

cost will be 303.9932. Regularized gradient will be formulated as below.

regularized gradient

Regularized gradient is implemented in R as below.

#Gradient
gradient <- function(X,y,theta,lambda)
{
  m <- nrow(X)
  grad <- rep(0,length(theta))
  residual <- (X%*%theta) - y
  
  grad[1] <- (1/m)* sum((residual)*X[,1])
  
  for(i in 2:length(theta))
  {
    grad[i] <- (1/m)* sum((residual)*X[,i]) + (lambda/m)*theta[i]
  }
  
  return(grad)
}

#Gradient at initial theta
gradient(X,y,theta,lambda)

Let us train linear model without regularization and visualize fitted model. By setting the value of lambda zero will train model without regularization

#Set inital theta for training the linear regression
initial_theta <- rep(0,ncol(X))

#Set lambda
lambda <- 0

# Derive theta using gradient descent using optim function
theta_optim <- optim(par=initial_theta,fn=cost,X=X,y=y,lambda=lambda)

#Plot fitted line
plot(data$X,data$y,xlab="Change in water level (X)",ylab="water flowing out of dam (y)")
abline(coef=theta_optim$par)

linear_fit

Now, we will implement learning curve for bias and variance diagnosis. In the learning curve we will plot training error and cross validation error over the number of training model. In this process, we will do following things

  1. Train model using training examples by increasing order
  2. At each time, record training error
  3. Apply trained model on cross validation set and measure cross validation set error

This process will continue until all training example used in training and then visualize training and cross validation set errors. This implementation is shown below.

#cross validation set
#X_val
X_val <- as.matrix(data$Xval)

#add ones to X_val
X_val <- cbind(rep(1,nrow(X_val)),X_val)

#y_val
y_val <- as.matrix(data$yval)


#Learning curve
leaningCurve <- function(X,y,X_val,y_val)
{
  m <- nrow(X)
  error_train <- rep(0,m)
  error_val <- rep(0,m)
  
  for(i in 2:m)
  {
    initial_theta <- rep(1,ncol(X))
    optim <- optim(par=initial_theta,fn=cost,X=X[1:i,],y=y[1:i,],lambda=0)
    theta <- optim$par
    error_train[i] <- cost(X=X[1:i,],y=y[1:i,],theta,lambda=0)
    error_val[i] <- cost(X=X_val,y=y_val,theta,lambda=0)
  }
  
  return(list(error_train=error_train,error_val=error_val))

}

# Error on training and testing
error <- leaningCurve(X,y,X_val,y_val)

# get the range for the x and y axis
xrange <- range(1:nrow(X))
yrange <- range(error$error_val)
colors <- rainbow(2)
linetype <- c(1:2)
plotchar <- seq(18,19,1)

#Learning curve
plot(xrange,yrange,xlab="Number of training example",ylab="Error")
lines(2:nrow(X), error$error_train[2:nrow(X)], type="b", lwd=1.5,
      lty=linetype[1], col=colors[1], pch=plotchar[1]) 
lines(2:nrow(X), error$error_val[2:nrow(X)], type="b", lwd=1.5,
      lty=linetype[2], col=colors[2], pch=plotchar[2]) 
legend(xrange[1], yrange[2], 1:2, cex=0.8, col=colors,
       pch=plotchar, lty=linetype, title="Linear Regression learing curve")

learning_curve

In the next blog post, I will discuss more in detail about interpreting above learning curve and how to identify bias variance problem using this learning curve.

Please do post your comments and feedback. Your comments are very much valuable for us.

Stay tuned!

Powered by Google+ Comments

The post Regularization implementation in R : Bais and Variance diagnosis appeared first on Pingax.

To leave a comment for the author, please follow the link and comment on his blog: Pingax » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.