Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Welcome to the first part of my series blog post. In this post, I will discuss about how to implement linear regression step by step in R by understanding the concept of regression. I will try to explain the concept of linear regression in very short manner and try to convert mathematical formulas in to codes(hope you may like this!!).  I really inspired from Andrew NG and Machine learning course from coursera and get inspiration to write this blog post.

So, let us start first understanding linear regression. You can also view the video lecture of the Linear regression from Machine learning class. We will try to understand in very short manner.

Regression

Regression is widely used for prediction. Focus of the regression is on the relationship between dependent variable and one or more independent variables. Regression analysis helps to understand how the value of the dependent variable changes when any one of the independent variables is varied, while the other independent variables are held fixed. In the regression dependent variable is estimated as function of independent variables which is called regression function. In the regression model following parameters are used.

1. Independent variables X.
2. Dependent variable Y.
3. Unknown parameter ø

In regression model Y is function of (X, ø). There are many techniques for regression analysis.

Linear regression

In the Linear regression, dependent variable Y is linear combination of the independent variables. Here regression function is known as hypothesis which is defined as below

hƟ(x) = f(x,Ɵ)

Suppose we have dependent variable is Y and independent variables are x1,x2,x3.  Hypothesis is defined as below The goal is to find the values Ɵ’s(known as coefficients) so that we can fit model on data. In the linear regression model regression model is the straight line. We can predict the value of dependent variable from independent variables. Starting with Ɵ’s values zeros, we find that difference between actual and predicted value is big. Cost function is used as measurement parameter of linear regression model. Cost function is defined as below. We have to change the values of Ɵs to minimize cost. Gradient descent is used to minimize the cost. Every time values ofƟs are updated until hopefully cost is minimized. it is as below Partial derivative of cost function respect to Ɵs is performed in gradient descent and α is learning rate. We get following equations as below We have to iterate above equation until we get some values of Ɵs which minimize the cost and we get better predictions.

So this is all about the linear regression. We understood the linear regression hypothesis, coefficient, cost function and cost minimization process to derive best coefficient to fit linear model. In the next part, we start implementing Linear regression using sample data set