Machine Learning Ex 5.1 – Regularized Linear Regression

Posted on October 25, 2011 by R on Guangchuang Yu in R bloggers | 0 Comments

[This article was first published on YGC » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The first part of the Exercise 5.1 requires to implement a regularized version of linear regression.

Adding regularization parameter can prevent the problem of over-fitting when fitting a high-order polynomial.

Plot the data:

^?View Code RSPLUS

x <- read.table("ex5Linx.dat")
y <- read.table("ex5Liny.dat")
 
x <- x[,1]
y <- y[,1]
 
require(ggplot2)
d <- data.frame(x=x,y=y)
p <- ggplot(d, aes(x,y)) + geom_point(colour="red", size=3)

I will fit a 5th order polynomial, the hypothesis is:
$h_\theta(x) = \theta_0 x_0 + \theta_1 x_1 + \theta_2 x_2^2 + \theta_3 x_3^3 + \theta_4 x_4^4 + \theta_5 x_5^5$

The idea of regularization is to impose Occam's razor on the solution, by scaling down the $\theta$ which will lead to the tiny contribution of the higher order features.

For that, the cost function was defined as:
$J(\theta) = \frac{1}{2m} [\sum_{i=1}^m ((h_\theta(x^{(i)}) - y^{(i)})^2) + \lambda \sum_{i=1}^n \theta^2]$

Gradient Descent with regularization parameters:

Firstly, I implemented a gradient descent algorithm to find the theta.

^?View Code RSPLUS

mapFeature <- function(x, degree=5) {
    return(sapply(0:degree, function(i) x^i))
}
 
## hypothesis function
h <- function(theta, x) {
    #sapply(1:m, function(i) theta %*% x[i,])
    toReturn <- x %*% t(theta)
    return(toReturn)
}
 
## cost function
J <- function(theta, x, y, lambda=1) {
    m <- length(y)
    r <- theta^2
    r[1] <- 0
    j <- 1/(2*m) * sum((h(theta, x)-y)^2) + lambda*sum(r)
    return(j)
}
 
gradDescent <- function(theta, x, y, alpha=0.1, niter=1000, lambda=1) {
    m <- length(y)
    for (i in 1:niter) {
        tt <- theta
        tt[1] <- 0
        dj <- 1/m * (t(h(theta,x)-y) %*% x + lambda * tt)
        theta <- theta - alpha * dj
    }
    return(theta)
}

fitting the data with gradient descent algorithm:

^?View Code RSPLUS

x <- mapFeature(x)
theta <- matrix(rep(0,6), nrow=1)
theta <- gradDescent(theta, x, y)
 
x.test <- seq(-1,1, 0.001)
y.test <- mapFeature(x.test) %*% t(theta)
 
p+geom_line(aes(x=x.test, y=y.test), colour="blue")

As shown above, the fitting model fits the data well.

Normal Equation with regularization parameters:
The Exercise requires implementing Normal Equation with the regularization parameters added.
that is:
$\theta = (X^T X + \lambda \begin{bmatrix} 0 & & & \\ & 1 & & \\ & & ? & \\ & & & 1 \end{bmatrix} )^{-1} (X^T y)$

^?View Code RSPLUS

## normal equations
normEq <- function(x,y, lambda) {
    n <- ncol(x)
    ## extra regularizatin terms
    r <- lambda * diag(n)
    r[1,1] <- 0
    theta <- solve(t(x) %*% x + r) %*% t(x) %*% y
    return(theta)
}

I try 3 different lambda values to see how it influences the fit.

^?View Code RSPLUS

lambda <- c(0,1,10)
theta <- sapply(lambda, normEq, x=x, y=y)
x.test <- seq(-1,1, 0.001)
yy <- sapply(1:3, function(i) mapFeature(x.test) %*% theta[,i])
yy <- melt(yy)
yy[,1] <- rep(x.test, 3)
colnames(yy) <- c("X", expression(lambda), "Y")
yy$lambda=factor(yy$lambda, labels=unique(lambda))
p+geom_line(data=yy,aes(X,Y, group=lambda, colour=lambda))

With lambda=0, the fit is very tight to the original points (the red line), and of course it is over-fitting.

As lambda increase, the model gets less tight and more generalized, and therefore preventing over-fitting.

This figure can also lead to a conclusion, that when lambda is too large, the model will under-fitting.

Reference:
Machine Learning Course
Exercise 5

October 26, 2011 -- Machine Learning Ex 5.2 – Regularized Logistic Regression (3)
March 22, 2011 -- Machine Learning Ex2 – Linear Regression (9)
March 29, 2011 -- Machine Learning Ex3 – Multivariate Linear Regression (0)
October 24, 2011 -- Machine Learning Ex4 – Logistic Regression (5)
April 12, 2012 -- 支持向量机 (0)

To leave a comment for the author, please follow the link and comment on their blog: YGC » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Machine Learning Ex 5.1 – Regularized Linear Regression

Related Posts

Related

Related Posts

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)