**Pingax » R**, and kindly contributed to R-bloggers)

Welcome to this blog post. In previous posts I discussed about the linear regression and logistic regression in detail. We used Andrew NG’s ML class dataset to fit linear regression and logistic regression. We also discussed about step by step implementation in R along with cost function and gradient descent. In this post I will discuss about two major concept of supervised learning: Regularization and Bias-Variance diagnosis and its implementation. So let’s get started.

I have divided post into two sections:

- Concept
- Implementation

In first section we will understand regularization and bias-variance diagnosis and in second section we will discuss on R implementation.

**1. Concept**

I would like to recommend you to view the Coursera ML class of regularization to understand the concept more in depth. Before we discuss about the regularization, let’s understand under fitting and over fitting first (if you know this then skip this section and jump to next section of implementation). So whenever we use polynomial function or large set of features into fitting model, model will over fit on the data. If we are using linear function or fewer features set, then model will under fit on the data. In both cases, model will not be able to generalize for new data and prediction error will be large, so we require right fit on the data. This is shown in below image.

If model is under fitted then it is bias problem, if over fitted then it is variance problem. It becomes very much crucial to understand whether our predictive model is suffering from bias problem or variance problem during the modelling process. This require to perform bias variance diagnosis by dividing dataset into three parts 1) training 2)testing 3) cross validation and analyzing prediction error on these three parts. We will later discuss more in detail.

If we are over fitting model then, it requires penalizing theta parameters in order to make just right fit. This will lead to use regularization in model fitting. So, regularization is technique to avoid over fitting. Regularization will penalize the theta parameters in the cost function. Now cost function will be defined as below

Where **λ**** **is regularization parameter, it decides degree of regularization. If **λ**** **is set to zero then there is no regularization and we need to set right value of lambda. In order to decide the right value of the **λ****, **we need to use cross validation set.

Whenever we are performing bias – variance diagnosis, it becomes more important to understand characteristics of the bias problem and variance problem. These characteristics can be understood by learning curves and that help us to understand about specific criteria of bias and variance problem. One of the learning curves is shown below.

Now, we have understood little bit about regularization, bias-variance and learning curve. Let’s move ahead towards the implementation of regularization and learning curve using simple linear regression model.

**2. Implementation**

We will use dataset which is provided in courser ML class assignment for regularization. We will implement regularized linear regression to predict amount of water flowing out of dam using the change in the water level.

Let us first load the dataset into R (download dataset)

#Load library library(R.matlab) #Read dataset from the matlab file data <- readMat("ex5data1.mat")

Andrew Ng always advice to visualize dataset first. So we will begin by visualizing dataset containing water level, x, and amount of water flowing out of the dam, y.

#Plot training data plot(data$X,data$y,xlab="Change in water level (X)",ylab="water flowing out of dam (y)")

Now let us implement regularized cost function which is shown above

#Predictor variables X <- as.matrix(data$X) #Add ones to X X <- cbind(rep(1,nrow(X)),X) #Response variable y <- as.matrix(data$y) #Cost Function cost <- function(X,y,theta,lambda) { m <- nrow(X) residual <- (X%*%theta) - y J <- (1/(2*m))*sum((residual)^2)+(lambda/(2*m))* sum((theta[2:length(theta)])^2) return(J) }

Let’s check regularized cost with theta parameters values set to ones

#initial theta theta <- rep(1,ncol(X)) #inital lambda lambda <- 1 #cost at inital theta cost(X,y,theta,lambda)

cost will be 303.9932. Regularized gradient will be formulated as below.

Regularized gradient is implemented in R as below.

#Gradient gradient <- function(X,y,theta,lambda) { m <- nrow(X) grad <- rep(0,length(theta)) residual <- (X%*%theta) - y grad[1] <- (1/m)* sum((residual)*X[,1]) for(i in 2:length(theta)) { grad[i] <- (1/m)* sum((residual)*X[,i]) + (lambda/m)*theta[i] } return(grad) } #Gradient at initial theta gradient(X,y,theta,lambda)

Let us train linear model without regularization and visualize fitted model. By setting the value of lambda zero will train model without regularization

#Set inital theta for training the linear regression initial_theta <- rep(0,ncol(X)) #Set lambda lambda <- 0 # Derive theta using gradient descent using optim function theta_optim <- optim(par=initial_theta,fn=cost,X=X,y=y,lambda=lambda) #Plot fitted line plot(data$X,data$y,xlab="Change in water level (X)",ylab="water flowing out of dam (y)") abline(coef=theta_optim$par)

Now, we will implement learning curve for bias and variance diagnosis. In the learning curve we will plot training error and cross validation error over the number of training model. In this process, we will do following things

- Train model using training examples by increasing order
- At each time, record training error
- Apply trained model on cross validation set and measure cross validation set error

This process will continue until all training example used in training and then visualize training and cross validation set errors. This implementation is shown below.

#cross validation set #X_val X_val <- as.matrix(data$Xval) #add ones to X_val X_val <- cbind(rep(1,nrow(X_val)),X_val) #y_val y_val <- as.matrix(data$yval) #Learning curve leaningCurve <- function(X,y,X_val,y_val) { m <- nrow(X) error_train <- rep(0,m) error_val <- rep(0,m) for(i in 2:m) { initial_theta <- rep(1,ncol(X)) optim <- optim(par=initial_theta,fn=cost,X=X[1:i,],y=y[1:i,],lambda=0) theta <- optim$par error_train[i] <- cost(X=X[1:i,],y=y[1:i,],theta,lambda=0) error_val[i] <- cost(X=X_val,y=y_val,theta,lambda=0) } return(list(error_train=error_train,error_val=error_val)) } # Error on training and testing error <- leaningCurve(X,y,X_val,y_val) # get the range for the x and y axis xrange <- range(1:nrow(X)) yrange <- range(error$error_val) colors <- rainbow(2) linetype <- c(1:2) plotchar <- seq(18,19,1) #Learning curve plot(xrange,yrange,xlab="Number of training example",ylab="Error") lines(2:nrow(X), error$error_train[2:nrow(X)], type="b", lwd=1.5, lty=linetype[1], col=colors[1], pch=plotchar[1]) lines(2:nrow(X), error$error_val[2:nrow(X)], type="b", lwd=1.5, lty=linetype[2], col=colors[2], pch=plotchar[2]) legend(xrange[1], yrange[2], 1:2, cex=0.8, col=colors, pch=plotchar, lty=linetype, title="Linear Regression learing curve")

In the next blog post, I will discuss more in detail about interpreting above learning curve and how to identify bias variance problem using this learning curve.

Please do post your comments and feedback. Your comments are very much valuable for us.

**Stay tuned!**

Powered by Google+ Comments

The post Regularization implementation in R : Bais and Variance diagnosis appeared first on Pingax.

**leave a comment**for the author, please follow the link and comment on his blog:

**Pingax » R**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...