Regressionstyle="text-align: justify">Through this post I am going to explain How Linear Regression works? Let us start with what is regression and how it works? Regression is widely used for prediction and forecasting in field of machine learning. Focus of regression is on the relationship between dependent and one or more independent variables. The “dependent variable” represents the output or effect, or is tested to see if it is the effect. The “independent variables” represent the inputs or causes, or are tested to see if they are the cause. Regression analysis helps to understand how the value of the dependent variable changes when any one of the independent variables is varied, while the other independent variables are kept unchanged. In the regression, dependent variable is estimated as function of independent variables which is called regression function. Regression model involves following variables.
- Independent variables X.
- Dependent variable Y
- Unknown parameter θ
Linear regressionstyle="text-align: justify">In the Linear regression, dependent variable(Y) is the linear combination of the independent variables(X). Here regression function is known as hypothesis which is defined as below.
hθ(X) = f(X,θ)style="text-align: justify">Suppose we have only one independent variable(x), then our hypothesis is defined as below.
href="http://www.tatvic.com/blog/wp-content/uploads/2012/09/htheta1.png"> class="size-full wp-image-2863 aligncenter" src="http://www.tatvic.com/blog/wp-content/uploads/2012/09/htheta1.png" alt="" width="223" height="28" />style="text-align: justify">The goal is to find some values of θ(known as coefficients), so we can minimize the difference between real and predicted values of dependent variable(y). If we take the values of all θ are zeros, then our predicted value will be zero. Cost function is used as measurement factor of linear regression model and it calculates average squared error for m observations. Cost function is denoted by J(θ) and defined as below.
href="http://www.tatvic.com/blog/wp-content/uploads/2012/09/Costfunction.png"> class="size-full wp-image-2865 aligncenter" src="http://www.tatvic.com/blog/wp-content/uploads/2012/09/Costfunction.png" alt="" width="383" height="50" />style="text-align: justify">As we can see from the above formula, if cost is large then, predicted value is far from the real value and if cost is small then, predicted value is nearer to real value. Therefor, we have to minimize cost to meet more accurate prediction.
Linear regression in Rstyle="text-align: justify"> title="R language" href="http://www.r-project.org/" >R is language and environment for statistical computing. title="R language" href="http://www.r-project.org/" >R has powerful and comprehensive features for fitting regression models. We will discuss about how linear regression works in title="R language" href="http://www.r-project.org/" >R. In title="R language" href="http://www.r-project.org/" >R, basic function for fitting linear model is title="lm() function" href="http://stat.ethz.ch/R-manual/R-patched/library/stats/html/lm.html" >lm(). The format is style="text-align: justify">fit <- lm(formula, data) style="text-align: justify">where formula describes model(in our case linear model) and data describes which data are used to fit model. The resulting object(fit in this case) is a list that contains information about the fitted model. The formula typically written as style="text-align: justify">Y ~ x1 + x2 + … + xk style="text-align: justify">where ~ separates the dependent variable(y) on the left from independent variables(x1, x2, ….. , xk) from right, and the independent variables are separated by + signs. let’s see simple regression example(example is from book R in action). We have the dataset women which contains height and weight for a set of 15 women ages 30 to 39. we want to predict weight from height. R code to fit this model is as below.
>fit <-lm(weight ~ height, data=women) >summary(fit)style="text-align: justify">Output of the summary function gives information about the object fit. Output is as below
Call: lm(formula = weight ~ height, data = women) Residuals: Min 1Q Median 3Q Max -1.7333 -1.1333 -0.3833 0.7417 3.1167 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -87.51667 5.93694 -14.74 1.71e-09 *** height 3.45000 0.09114 37.85 1.09e-14 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.525 on 13 degrees of freedom Multiple R-squared: 0.991, Adjusted R-squared: 0.9903 F-statistic: 1433 on 1 and 13 DF, p-value: 1.091e-14style="text-align: justify">Let’s understand the output. Values of coefficients(θs) are -87.51667 and 3.45000, hence prediction equation for model is as below style="text-align: justify">Weight = -87.52 + 3.45*height style="text-align: justify">In the output, residual standard error is cost which is 1.525. Now, we will look at real values of weight of 15 women first and then will look at predicted values. Actual values of weight of 15 women are as below
Output  115 117 120 123 126 129 132 135 139 142 146 150 154 159 164style="text-align: justify">Predicted values of 15 women are as below
Output 1 2 3 4 5 6 7 8 9 112.5833 116.0333 119.4833 122.9333 126.3833 129.8333 133.2833 136.7333 140.1833 10 11 12 13 14 15 143.6333 147.0833 150.5333 153.9833 157.4333 160.8833style="text-align: justify">We can see that predicted values are nearer to the actual values.Finally, we understand what is regression, how it works and regression in title="R language" href="http://www.r-project.org/">R.