Machine Learning Ex 5.2 – Regularized Logistic Regression

[This article was first published on YGC » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Now we move on to the second part of the Exercise 5.2, which requires to implement regularized logistic regression using Newton’s Method.

Plot the data:


We will now fit a regularized regression model to this data.

The hypothesis function in logistic regression is :
\(h_\theta(x) = g(\theta^T x) = \frac{1}{ 1 + e ^{- \theta^T x} }=P(y=1\vert x;\theta)\)

In this exercise, we will assign \(x\) , in the \(\theta^Tx\) , to be all monomials of \(u\) and \(v\) up to the sixth power:
\( x=\left[\begin{array}{c} 1\\ u\\ v\\ u^2\\ uv\\ v^2\\ u^3\\ \vdots\\ uv^5\\ v^6\end{array}\right] \)

where \(x_0 = 1, x_1=u, x_2= v,\ldots x_{28} =v^6\) .

I defined the function mapFeature, that maps the original inputs to the feature vector.


Regularized Logistic Regression:

The cost function of regularized logistic regression is defined as:
\( J(\theta)=-\frac{{1}}{m}\sum_{i=1}^{m}\left[ y^{(i)}\log(h_… …\right] + \frac{\lambda}{2m}\sum_{j=1}^{n}\theta_{j}^{2} \)

Notice that this function can work for regularized (lambda > 0) and unregularized (lambda = 0) logistic regression. The regularization term at the end will lead to a more tiny \(\theta\) , thus obtain a more generalized fit, which more likely will work better on new data (for doing predictions).

Newton’s Method:

The Newton’s Method update rule is:
\(\theta^{(t+1)} = \theta^{(t)} – H^{-1} \nabla_{\theta}J\)

In the regularized version of logistic regression, the gradient \(\nabla_{\theta}(J)\) and the Hessian \(H\) have different forms:

\(\nabla_{\theta}J = \frac{1}{m} \sum_{i=1}^m (h_\theta(x) – y) x + \frac{\lambda}{m} \theta\)

\(H = \frac{1}{m} \sum_{i=1}^m [h_\theta(x) (1 – h_\theta(x)) x^T x] + \frac{\lambda}{m} \begin{bmatrix} 0 & & & \\ & 1 & & \\ & & ? & \\ & & & 1 \end{bmatrix}\)

Also notice that, when lambda=0, you will see the same formulas as unregularized logistic regression.

Here is my implementation:

##sigmod function

First, I calculate the theta, for lambda=1.


To validate the function is converging properly, We plot the values obtained from cost function against number of iterations.

    ylab("Cost J")

Converging fast.

Now, we make it iterate for lambda = 0 and lambda=10 for comparing the fitting models.


Finally calcuate the decision boundary line and visulize it.


The red line (lambda=0) is more tightly fit to the crosses.
As lambda increase, the fit becomes more loose and more generalized.

PS:it’s very weird that the legends in the above figure not shown properly.

Related Posts

To leave a comment for the author, please follow the link and comment on their blog: YGC » R. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)