Logistic Regressionstyle="text-align: justify">In my title="Linear Regression using R" href="http://www.tatvic.com/blog/linear-regression-using-r/" >first blog post, I have explained about the what is regression? And how linear regression model is generated in R? In this post, I will explain what is logistic regression? And how the logistic regression model is generated in R? style="text-align: justify">Let’s first understand logistic regression. Logistic regression is one of the type of regression and it is used to predict outcome of the categorical dependent variable. (i.e. categorical variable has limited number of categorical values) based on the one or more independent variables. For example, if you would like to predict who will win the next T20 world cup, based on player’s strength and other details. It is a prediction done with categorical variable. Logistic regression can be binomial or multinomial. style="text-align: justify">In the binomial or binary logistic regression, the outcome can have only two possible types of values (e.g. “Yes” or “No”, “Success” or “Failure”). Multinomial logistic refers to cases where the outcome can have three or more possible types of values (e.g., “good” vs. “very good” vs. “best” ). Generally outcome is coded as “0″ and “1″ in binary logistic regression. We will use binary logistic regression in the rest of the part of the blog. Now, we will look at how the logistic regression model is generated in R.
Logistic regression in Rstyle="text-align: justify">To fit logistic regression model, title="glm() function " href="http://web.njit.edu/all_topics/Prog_Lang_Docs/html/library/base/html/glm.html" >glm() function is used in title="R language" href="http://www.r-project.org/" >R which is similar to title="lm() function" href="http://stat.ethz.ch/R-manual/R-patched/library/stats/html/lm.html" >lm(), but title="glm() function " href="http://web.njit.edu/all_topics/Prog_Lang_Docs/html/library/base/html/glm.html" >glm() includes additional parameters. The format is style="text-align: justify">glm(Y~X1+X2+X3, family=binomial(link=”logit”), data=mydata) style="text-align: justify">Here, Y is dependent variable and X1, X2 and X3 are independent variables. Function includes additional parameter family and it has value binomial(link=”logit”) which means the probability distribution of regression model is binomial and link function is logit (Refer book R in Action for more information). Let’s generate a simple model. Suppose we want to predict whether a student will get admission based on his two exam scores. For this problem we have a historical data from previous applicants which can be used as the training data set to build a model. The data set contains the following parameters.
- exam_1- Exam-1 score
- exam_2- Exam-2 score
- admitted- 1 if admitted or 0 if not admitted
>Model_1<-glm(admitted ~ exam_1 +exam_2, family = binomial("logit"), data=data)style="text-align: justify">After generating the model, let’s try to predict using this model. Suppose we have two exam marks of a student, 60 of exam_1 and 85 of exam_2. We will predict that will student get admission? Following is R code for predicting probability of student to get admission.
>in_frame<-data.frame(exam_1=60,exam_2=86) >predict(Model_1,in_frame, type="response")
Output 0.9894302style="text-align: justify">Here, the output is given as a probability score which has value in range 0 to 1. If the probability score is greater than 0.5 then it is considered as TRUE. If the probability score is less than or equal to 0.5 then it is considered as FALSE. In our case 1 or 0 will be considered as the output to decide, will student get admission or not? if it is 1 then student will get admission otherwise not. So I have used title="round() function" href="http://stat.ethz.ch/R-manual/R-devel/library/base/html/Round.html" >round() function to convert probability score to 0 or 1. It is as below.
>round(predict(Model_1, in_frame, type="response"))
Output 1style="text-align: justify">Output is 1 means a student will get admission. We can also predict for other observations in the above manner. Finally we understood what is logistic regression? And how it works in title="R language" href="http://www.r-project.org/" >R? If you want to do the same exercise, href="http://www.tatvic.com/blog/downloads/LogisticRegression-1.rar" onclick="_gaq.push(['_trackEvent','Downloads','Logistic Regression 1','Blog',,1]);">Click here for R code and sample data set of above example. In the title="Predict User's Return Visit within a day Part-1" href="http://www.tatvic.com/blog/predict-users-return-visit-within-a-day-part-1/" >next blog, we will discuss about a specific problem for Google Analytics data and see how to use logistic regression into?
style="color:#2361A1">Would you like to understand the value of predictive analysis when applied on web analytics data to help improve your understanding relationship between different variables? We think you may like to watch our Webinar – How to perform predictive analysis on your web analytics tool data. href="http://www.tatvic.com/perform-predictive-analysis-on-your-web-analytics-tool/?utm_source=post&utm_medium=blog&%23038;utm_campaign=webinar3" >Watch the Replay now! class="wp-about-author-containter-top" style="background-color:#FFEAA8;"> class="wp-about-author-pic"> src="http://www.tatvic.com/blog/wp-content/uploads/userphoto/14.jpg" alt="Amar Gondaliya" width="60" class="photo" />