**Tatvic Blog » R**, and kindly contributed to R-bloggers)

## Logistic Regression

style="text-align: justify">In my
title="Linear Regression using R" href="http://www.tatvic.com/blog/linear-regression-using-r/" >first blog post, I have explained about the what is regression? And how linear regression model is generated in R? In this post, I will explain what is logistic regression? And how the logistic regression model is generated in R?
style="text-align: justify">Let’s first understand logistic regression. Logistic regression is one of the type of regression and it is used to predict outcome of the categorical dependent variable. (i.e. categorical variable has limited number of categorical values) based on the one or more independent variables. For example, if you would like to predict who will win the next T20 world cup, based on player’s strength and other details. It is a prediction done with categorical variable. Logistic regression can be binomial or multinomial.
style="text-align: justify">In the binomial or binary logistic regression, the outcome can have only two possible types of values (e.g. “Yes” or “No”, “Success” or “Failure”). Multinomial logistic refers to cases where the outcome can have three or more possible types of values (e.g., “good” vs. “very good” vs. “best” ). Generally outcome is coded as “0″ and “1″ in binary logistic regression. We will use binary logistic regression in the rest of the part of the blog. Now, we will look at how the logistic regression model is generated in R.
style="text-align: justify">To fit logistic regression model,
title="glm() function " href="http://web.njit.edu/all_topics/Prog_Lang_Docs/html/library/base/html/glm.html" >glm() function is used in
title="R language" href="http://www.r-project.org/" >R which is similar to
title="lm() function" href="http://stat.ethz.ch/R-manual/R-patched/library/stats/html/lm.html" >lm(), but
title="glm() function " href="http://web.njit.edu/all_topics/Prog_Lang_Docs/html/library/base/html/glm.html" >glm() includes additional parameters. The format is
style="text-align: justify">glm(Y~X1+X2+X3, family=binomial(link=”logit”), data=mydata)
style="text-align: justify">Here, Y is dependent variable and X1, X2 and X3 are independent variables. Function includes additional parameter
style="text-align: justify">In the above parameters, parameter
style="text-align: justify">After generating the model, let’s try to predict using this model. Suppose we have two exam marks of a student, 60 of exam_1 and 85 of exam_2. We will predict that will student get admission? Following is R code for predicting probability of student to get admission.
style="text-align: justify">Here, the output is given as a probability score which has value in range 0 to 1. If the probability score is greater than 0.5 then it is considered as TRUE. If the probability score is less than or equal to 0.5 then it is considered as FALSE. In our case 1 or 0 will be considered as the output to decide, will student get admission or not? if it is 1 then student will get admission otherwise not. So I have used
title="round() function" href="http://stat.ethz.ch/R-manual/R-devel/library/base/html/Round.html" >round() function to convert probability score to 0 or 1. It is as below.
style="text-align: justify">Output is 1 means a student will get admission. We can also predict for other observations in the above manner. Finally we understood what is logistic regression? And how it works in
title="R language" href="http://www.r-project.org/" >R? If you want to do the same exercise,
href="http://www.tatvic.com/blog/downloads/LogisticRegression-1.rar" onclick="_gaq.push(['_trackEvent','Downloads','Logistic Regression 1','Blog',,1]);">Click here for R code and sample data set of above example. In the
title="Predict User's Return Visit within a day Part-1" href="http://www.tatvic.com/blog/predict-users-return-visit-within-a-day-part-1/" >next blog, we will discuss about a specific problem for Google Analytics data and see how to use logistic regression into?
style="color:#2361A1">Would you like to understand the value of predictive analysis when applied on web analytics data to help improve your understanding relationship between different variables? We think you may like to watch our Webinar – How to perform predictive analysis on your web analytics tool data. ## Logistic regression in R

*family* and it has value *binomial**(link=”logit”) *which means the probability distribution of regression model is *binomial* and link function is *logit (*Refer book *R in Action* for more information*).* Let’s generate a simple model. Suppose we want to predict whether a student will get admission based on his two exam scores. For this problem we have a historical data from previous applicants which can be used as the training data set to build a model. The data set contains the following parameters.

*admitted* has value 1 or 0 for each observation. Now, we will generate a model that can predict, will student get admission based on two exam scores? For a given problem, *admitted* is considered as dependent variable, *exam_1* and *exam_2* are considered as independent variables. The R code for the model is given as below.>Model_1<-glm(admitted ~ exam_1 +exam_2, family = binomial("logit"), data=data)

>in_frame<-data.frame(exam_1=60,exam_2=86)
>predict(Model_1,in_frame, type="response")

Output
0.9894302

>round(predict(Model_1, in_frame, type="response"))

Output
1

**
href="http://www.tatvic.com/perform-predictive-analysis-on-your-web-analytics-tool/?utm_source=post&utm_medium=blog&%23038;utm_campaign=webinar3" >Watch the Replay now!**