Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Welcome to the second part of series blog posts! In previous part, we discussed on the concept of the logistic regression and its mathematical formulation. Now, we will apply that learning here and try to implement step by step in R. (If you know concept of logistic regression then move ahead in this part, otherwise you can view previous post to understand it in very short manner).

In this post, We will discuss on implementation of cost function, gradient descent using optim() function and calculate accuracy  in R. So, let’s start.

Before, we create any code, it is good start to formulate logistic regression problem first. I will use same data set and problem provided the Coursera Machine Learning class logistic regression assignment. We will build logistic regression model to predict whether student will get admission or not in university.

Suppose that you are an administrator of a university and you want to know the chance of admission of each applicant based on their two exams. You have historical data from previous applicants which can be used as the training data for logistic regression.  Your task is to build a classification model that estimates each applicant’s probability of admission in university (Source:coursera machine learning class).

Now, we have understood classification problem that we are going to address. Let us understand the data. In data we have records of previous applicants’ two exams score and label whether applicant got admission or not (1-if got admission 0-otherwise).

Before starting implement any learning algorithm, it is always good to visualize the data if possible. So, let us plot data and understand who did get the admission in past. (Download data)

#Load data

#Create plot
plot(data$score.1,data$score.2,col=as.factor(data$label),xlab="Score-1",ylab="Score-2") From, the plot you can see that red marked observations indicate they got admission and black marked observation indicate they did not get. We can imagine clear cut classification boundary between these two groups. So, in this case we have two predictor variables (two exams’ scores) and label as response variable. Let us set predictor and response variables. #Predictor variables X <- as.matrix(data[,c(1,2)]) #Add ones to X X <- cbind(rep(1,nrow(X)),X) #Response variable Y <- as.matrix(data$label)

Before, we start with actual cost function. Recall the logistic regression hypothesis is defined as:

Where function g is the sigmoid function. The sigmoid function is defined as:

Our first step is to implement sigmoid function.

#Sigmoid function
sigmoid <- function(z)
{
g <- 1/(1+exp(-z))
return(g)
}

Now we will implement cost function. Recall the cost function in logistic regression is

Equivalent R code is as:

#Cost Function
cost <- function(theta)
{
m <- nrow(X)
g <- sigmoid(X%*%theta)
J <- (1/m)*sum((-Y*log(g)) - ((1-Y)*log(1-g)))
return(J)
}

Let’s test this cost function with initial theta parameters. We will set theta parameters equal to zero initially and check the cost.

#Intial theta
initial_theta <- rep(0,ncol(X))

#Cost at inital theta
cost(initial_theta)

You will find cost is 0.693 with initial parameters. Now, our objective is to minimize this cost and derive the optimal value of the thetas. For that we will use gradient descent optimization. In blog post ‘Linear regression with R:step by step implementation part-2’, I implemented gradient descent and defined the update function to optimize the values of theta.

Here I will use inbuilt function of R optim() to derive the best fitting parameters. Ultimately we want to have optimal value of the cost function and theta.

# Derive theta using gradient descent using optim function
theta_optim <- optim(par=initial_theta,fn=cost)

#set theta
theta <- theta_optim$par #cost at optimal value of the theta theta_optim$value

We have optimal values of the theta and cost is about 0.2034 at optimal value of theta. We can now use this theta parameter for predicting for the admission probability for new applicant based on the score of two exams.

For example, a student is with exam-1 score of 45 and exam-2 score of 85.

# probability of admission for student
prob <- sigmoid(t(c(1,45,85))%*%theta)

You will find probability is 0.774, it is above 0.5 which means that student will get admission.

So, until now we understood how to implement cost function, gradient descent and derived optimal values of the learning parameters (this looks very easy right?).  We started with one problem definition of the classification and ended up with prediction. In the next part, we will try to implement prediction function for batch prediction, calculate training accuracy and compare results with the glm() function output. Do post your comments and queries. That is extremely important for me to learn new things.