Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Welcome to this blog post. In previous posts I discussed about the linear regression and logistic regression in detail. We used Andrew NG’s ML class dataset to fit linear regression and logistic regression. We also discussed about step by step implementation in R along with cost function and gradient descent. In this post I will discuss about two major concept of supervised learning: Regularization and Bias-Variance diagnosis and its implementation. So let’s get started.

I have divided post into two sections:

1. Concept
2. Implementation

In first section we will understand regularization and bias-variance diagnosis and in second section we will discuss on R implementation.

1. Concept

I would like to recommend you to view the Coursera ML  class of regularization to understand the concept more in depth. Before we discuss about the regularization, let’s understand under fitting and over fitting first (if you know this then skip this section and jump to next section of implementation). So whenever we use polynomial function or large set of features into fitting model, model will over fit on the data. If we are using linear function or fewer features set, then model will under fit on the data.  In both cases, model will not be able to generalize for new data and prediction error will be large, so we require right fit on the data. This is shown in below image. If model is under fitted then it is bias problem, if over fitted then it is variance problem.  It becomes very much crucial to understand whether our predictive model is suffering from bias problem or variance problem during the modelling process. This require to perform bias variance diagnosis by dividing dataset into three parts 1) training 2)testing 3) cross validation and analyzing prediction error on these three parts. We will later discuss more in detail.

If we are over fitting model then, it requires penalizing theta parameters in order to make just right fit. This will lead to use regularization in model fitting. So, regularization is technique to avoid over fitting. Regularization will penalize the theta parameters in the cost function. Now cost function will be defined as below Where λ is regularization parameter, it decides degree of regularization. If λ is set to zero then there is no regularization and we need to set right value of lambda. In order to decide the right value of the λ, we need to use cross validation set.

Whenever we are performing bias – variance diagnosis, it becomes more important to understand characteristics of the bias problem and variance problem. These characteristics can be understood by learning curves and that help us to understand about specific criteria of bias and variance problem. One of the learning curves is shown below. Now, we have understood little bit about regularization, bias-variance and learning curve. Let’s move ahead towards the implementation of regularization and learning curve using simple linear regression model.

2. Implementation

We will use dataset which is provided in courser ML class assignment for regularization. We will implement regularized linear regression to predict amount of water flowing out of dam using the change in the water level.

```#Load library
library(R.matlab)

#Read dataset from the matlab file
```

Andrew Ng always advice to visualize dataset first. So we will begin by visualizing dataset containing water level, x, and amount of water flowing out of the dam, y.

```#Plot training data
plot(data\$X,data\$y,xlab="Change in water level (X)",ylab="water flowing out of dam (y)")``` Now let us implement regularized cost function which is shown above

```#Predictor variables
X <- as.matrix(data\$X)

X <- cbind(rep(1,nrow(X)),X)

#Response variable
y <- as.matrix(data\$y)

#Cost Function
cost <- function(X,y,theta,lambda)
{
m <- nrow(X)
residual <- (X%*%theta) - y
J <- (1/(2*m))*sum((residual)^2)+(lambda/(2*m))* sum((theta[2:length(theta)])^2)
return(J)
}```

Let’s check regularized cost with theta parameters values set to ones

```#initial theta
theta <- rep(1,ncol(X))

#inital lambda
lambda <- 1

#cost at inital theta
cost(X,y,theta,lambda)
```

cost will be 303.9932. Regularized gradient will be formulated as below. Regularized gradient is implemented in R as below.

```#Gradient
{
m <- nrow(X)
residual <- (X%*%theta) - y

for(i in 2:length(theta))
{
grad[i] <- (1/m)* sum((residual)*X[,i]) + (lambda/m)*theta[i]
}

}

```

Let us train linear model without regularization and visualize fitted model. By setting the value of lambda zero will train model without regularization

```#Set inital theta for training the linear regression
initial_theta <- rep(0,ncol(X))

#Set lambda
lambda <- 0

# Derive theta using gradient descent using optim function
theta_optim <- optim(par=initial_theta,fn=cost,X=X,y=y,lambda=lambda)

#Plot fitted line
plot(data\$X,data\$y,xlab="Change in water level (X)",ylab="water flowing out of dam (y)")
abline(coef=theta_optim\$par)
``` Now, we will implement learning curve for bias and variance diagnosis. In the learning curve we will plot training error and cross validation error over the number of training model. In this process, we will do following things

1. Train model using training examples by increasing order
2. At each time, record training error
3. Apply trained model on cross validation set and measure cross validation set error

This process will continue until all training example used in training and then visualize training and cross validation set errors. This implementation is shown below.

```#cross validation set
#X_val
X_val <- as.matrix(data\$Xval)

X_val <- cbind(rep(1,nrow(X_val)),X_val)

#y_val
y_val <- as.matrix(data\$yval)

#Learning curve
leaningCurve <- function(X,y,X_val,y_val)
{
m <- nrow(X)
error_train <- rep(0,m)
error_val <- rep(0,m)

for(i in 2:m)
{
initial_theta <- rep(1,ncol(X))
optim <- optim(par=initial_theta,fn=cost,X=X[1:i,],y=y[1:i,],lambda=0)
theta <- optim\$par
error_train[i] <- cost(X=X[1:i,],y=y[1:i,],theta,lambda=0)
error_val[i] <- cost(X=X_val,y=y_val,theta,lambda=0)
}

return(list(error_train=error_train,error_val=error_val))

}

# Error on training and testing
error <- leaningCurve(X,y,X_val,y_val)

# get the range for the x and y axis
xrange <- range(1:nrow(X))
yrange <- range(error\$error_val)
colors <- rainbow(2)
linetype <- c(1:2)
plotchar <- seq(18,19,1)

#Learning curve
plot(xrange,yrange,xlab="Number of training example",ylab="Error")
lines(2:nrow(X), error\$error_train[2:nrow(X)], type="b", lwd=1.5,
lty=linetype, col=colors, pch=plotchar)
lines(2:nrow(X), error\$error_val[2:nrow(X)], type="b", lwd=1.5,
lty=linetype, col=colors, pch=plotchar)
legend(xrange, yrange, 1:2, cex=0.8, col=colors,
pch=plotchar, lty=linetype, title="Linear Regression learing curve")
``` In the next blog post, I will discuss more in detail about interpreting above learning curve and how to identify bias variance problem using this learning curve.

Stay tuned!