Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Welcome to the second part! In previous part, we understood Linear regression, cost function and gradient descent. In this part we will implement whole process in R step by step using example data set. I will use the data set provided in the machine learning class assignment. We will implement linear regression with one variable to predict profits for food truck.

Let us first discuss the linear regression problem (Information is given in ML class assignment). Suppose you are the CEO of a restaurant franchise and are considering different cities for opening a new outlet. The chain already has trucks in various cities and you have data for profits and populations from the cities. You would like to use this data to help you which city to expand to next.

So, data set contains two columns. The first column is the population of city and the second column is the profit of a food truck in that city. A negative value for profit indicates loss.  Download the data set from here.

#Read data set
data <- read.csv("data.csv")

Before starting any of tasks, it is often useful to understand the data by visualizing it. We can see from the plot that city with higher population has high profit.

Here dependent variable is Profit and interdependent variable is Population. So let us set dependent variable Y and independent variable x.

#Dependent variable
y <- data$profit #Independent variable x <- data$population

The objective of linear regression is to minimize cost function Where hypothesis hΘ(x) is given by the linear model To take into account the intercept term Θ0, we add an additional first column to x and set it to all ones. This allows us to treat Θ0 as simply another feature.

Let us first add ones to x, also initialize Θ0 and Θ1 to zero and calculate cost using above equation.

#Add ones to x
x <- cbind(1,x)

# initalize theta vector
theta<- c(0,0)

# Number of the observations
m <- nrow(x)

#Calculate cost
cost <- sum(((x%*%theta)- y)^2)/(2*m)

For initial value of the theta parameter cost is 32.07, our objective is to minimize cost by updating the values of the thetas. One way to do this is to use batch gradient descent algorithm. We update the values of the thetas by iterating following equation. With each step of gradient descent, parameters come closer to the optimal values that will achieve the lowest cost. To do this we will set learning parameter alpha to 0.01 and iterations to 1500

# Set learning parameter
alpha <- 0.001

#Number of iterations
iterations <- 1500

# updating thetas using gradient update
for(i in 1:iterations)
{
theta <- theta - alpha * (1/m) * sum(((x%*%theta)- y))
theta <- theta - alpha * (1/m) * sum(((x%*%theta)- y)*x[,2])
}

After 1500 iterations, we will have lower cost than initial one and new thetas values. Now let us try to predict for the areas of the 35,000 and 70,000 people using new values of the theta.

#Predict for areas of the 35,000 and 70,000 people
predict1 <- c(1,3.5) %*% theta
predict2 <- c(1,7) %*% theta

So far we understood the implementation of the linear regression with one variable and prediction for the new data. In the next post I will discuss the other way of the cost minimization using optim() function and compare the result with the lm() function.