Machine Learning Ex 5.2 – Regularized Logistic Regression

Posted on October 25, 2011 by R on Guangchuang Yu in R bloggers | 0 Comments

[This article was first published on YGC » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Now we move on to the second part of the Exercise 5.2, which requires to implement regularized logistic regression using Newton’s Method.

Plot the data:

x <- read.csv("ex5Logx.dat", header=F)
y <- read.csv("ex5Logy.dat", header=F)
y <- y[,1]

d <- data.frame(x1=x[,1],x2=x[,2],y=factor(y))
require(ggplot2)
p <- ggplot(d, aes(x=x1, y=x2))+
    geom_point(aes(colour=y, shape=y))

We will now fit a regularized regression model to this data.

The hypothesis function in logistic regression is :
$h_\theta(x) = g(\theta^T x) = \frac{1}{ 1 + e ^{- \theta^T x} }=P(y=1\vert x;\theta)$

In this exercise, we will assign $x$ , in the $\theta^Tx$ , to be all monomials of $u$ and $v$ up to the sixth power:
$x=\left[\begin{array}{c} 1\\ u\\ v\\ u^2\\ uv\\ v^2\\ u^3\\ \vdots\\ uv^5\\ v^6\end{array}\right]$

where $x_0 = 1, x_1=u, x_2= v,\ldots x_{28} =v^6$ .

I defined the function mapFeature, that maps the original inputs to the feature vector.

mapFeature <- function(u,v, degree=6) {
    out <- sapply(0:degree,function(i)
                  sapply(0:i, function(j)
                         u^(i-j) * v^j
                         )
                  )
    out <- unlist(out)
    return(out)
}

Regularized Logistic Regression:

The cost function of regularized logistic regression is defined as:
$J(\theta)=-\frac{{1}}{m}\sum_{i=1}^{m}\left[ y^{(i)}\log(h_... ...\right] + \frac{\lambda}{2m}\sum_{j=1}^{n}\theta_{j}^{2}$

Notice that this function can work for regularized (lambda > 0) and unregularized (lambda = 0) logistic regression. The regularization term at the end will lead to a more tiny $\theta$ , thus obtain a more generalized fit, which more likely will work better on new data (for doing predictions).

Newton's Method:

The Newton's Method update rule is:
$\theta^{(t+1)} = \theta^{(t)} - H^{-1} \nabla_{\theta}J$

In the regularized version of logistic regression, the gradient $\nabla_{\theta}(J)$ and the Hessian $H$ have different forms:

$\nabla_{\theta}J = \frac{1}{m} \sum_{i=1}^m (h_\theta(x) - y) x + \frac{\lambda}{m} \theta$

$H = \frac{1}{m} \sum_{i=1}^m [h_\theta(x) (1 - h_\theta(x)) x^T x] + \frac{\lambda}{m} \begin{bmatrix} 0 & & & \\ & 1 & & \\ & & ? & \\ & & & 1 \end{bmatrix}$

Also notice that, when lambda=0, you will see the same formulas as unregularized logistic regression.

Here is my implementation:

##sigmod function
g <- function(z) {
    toReturn <- 1/(1+exp(-z))
    return(toReturn)
}

##hypothesis function
h <- function(theta, x) {
    g(x %*% theta)
}

## cost function
J <- function(theta, x,y,lambda=1) {
    m <- length(y)
    j <- -1/m * (
                 y %*% log( h(theta,x) ) +
                 (1-y) %*% log( 1- h(theta,x) )
                 )

    r <- theta^2
    r[1] <- 0
    j <- j + lambda/(2*m) * sum(r)
    return(j)
}



## gradient
grad <- function(theta, x, y, lambda=1) {
    m <- length(y)
    r <- lambda/m * theta
    r[1] <- 0
    g <- 1/m * t(x) %*% (h(theta,x)-y) + r
    return(g)
}

## Hessian
Hessian <- function(theta, x, lambda=1) {
    m <- nrow(x)
    n <- ncol(x)
    r <- lambda/m * diag(n)
    r[1] <- 0
    H <- 1/m * t(x) %*% x *diag(h(theta,x)) * diag(1-h(theta,x)) + r
    return(H)
}

First, I calculate the theta, for lambda=1.

colnames(x) <- c("u", "v")
x <- mdply(x, mapFeature)
x <- x[,c(-1,-2)]
x <- as.matrix(x)

theta <- matrix(rep(0, ncol(x)), ncol=1)
lambda <- 1
j <- rep(0,10)
for (i in 1:10) {
    theta <- theta - solve(Hessian(theta,x, lambda)) %*% grad(theta,x,y,lambda)
    j[i] <- J(theta,x,y, lambda)
}

To validate the function is converging properly, We plot the values obtained from cost function against number of iterations.

ggplot()+
    aes(x=1:10,y=j)+
    geom_point(colour="red")+
    geom_line()+xlab("Iteration")+
    ylab("Cost J")

Converging fast.

Now, we make it iterate for lambda = 0 and lambda=10 for comparing the fitting models.

theta0 <- matrix(rep(0, ncol(x)), ncol=1)
for (i in 1:10) {
  theta0 <- theta0 - solve(Hessian(theta0,x, lambda=0)) %*% grad(theta0,x,y,lambda=0)
}

theta10 <- matrix(rep(0, ncol(x)), ncol=1)
for (i in 1:10) {
  theta10 <- theta10 - solve(Hessian(theta10,x, lambda=10)) %*% grad(theta10,x,y,lambda=10)
}

Finally calcuate the decision boundary line and visulize it.

u <- seq(-1,1.2, len=200)
v <- seq(-1,1.2, len=200)

z0 <- matrix(0, length(u), length(v))
z1 <- matrix(0, length(u), length(v))
z10 <- matrix(0, length(u), length(v))

for (i in 1:length(u)) {
    for (j in 1:length(v)) {
        features <- mapFeature(u[i],v[j])
        z0[i,j] <- features %*% theta0
        z1[i,j] <- features %*% theta
        z10[i,j] <- features %*% theta10
    }
}


rownames(z0) <- rownames(z1) <- rownames(z10) <- as.character(u)
colnames(z0) <- colnames(z1) <- colnames(z10) <- as.character(v)

z0.melted <- melt(z0)
z1.melted <- melt(z1)
z10.melted <- melt(z10)

z0.melted <- data.frame(z0.melted, lambda=0)
z1.melted <- data.frame(z1.melted, lambda=1)
z10.melted <- data.frame(z10.melted, lambda=10)

zz <- rbind(z0.melted, z1.melted, z10.melted)
zz$lambda <- factor(zz$lambda)
colnames(zz) <- c("u", "v", "z", "lambda")

p+geom_contour(data=zz, aes(x=u, y=v, z=z, group=lambda, colour=lambda),bins=1)

The red line (lambda=0) is more tightly fit to the crosses.
As lambda increase, the fit becomes more loose and more generalized.

PS:it's very weird that the legends in the above figure not shown properly.

October 25, 2011 -- Machine Learning Ex 5.1 – Regularized Linear Regression (10)
October 24, 2011 -- Machine Learning Ex4 – Logistic Regression (5)
March 29, 2011 -- Machine Learning Ex3 – Multivariate Linear Regression (0)
March 22, 2011 -- Machine Learning Ex2 – Linear Regression (9)
April 12, 2012 -- 支持向量机 (0)

To leave a comment for the author, please follow the link and comment on their blog: YGC » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Machine Learning Ex 5.2 – Regularized Logistic Regression

Related Posts

Related

Related Posts

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)