# Plotting a Logistic Regression In Base R

**Steve's Data Tips and Tricks**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

# Introduction

Logistic regression is a statistical method used for predicting the probability of a binary outcome. It’s a fundamental tool in machine learning and statistics, often employed in various fields such as healthcare, finance, and marketing. We use logistic regression when we want to understand the relationship between one or more independent variables and a binary outcome, which can be “yes/no,” “1/0,” or any two-class distinction.

# Getting Started

Before we dive into plotting the logistic regression curve, let’s start with the basics. First, you’ll need some data. For this blog post, I’ll assume you have your dataset ready. If you don’t, you can easily find sample datasets online to practice with.

# Load the Data

In R, we use the `read.csv`

function to load a CSV file into a data frame. For example, if you have a dataset called “mydata.csv,” you can load it like this:

# Load the data into a data frame data <- read.csv("mydata.csv")

We will instead use the following data set:

library(dplyr) set.seed(123) df <- tibble( x = runif(100, 0, 10), y = rbinom(100, 1, 1 / (1 + exp(-1 * (0.5 * x - 2.5)))) ) head(df)

# A tibble: 6 × 2 x y <dbl> <int> 1 2.88 0 2 7.88 1 3 4.09 0 4 8.83 0 5 9.40 1 6 0.456 0

# Fit a Logistic Regression Model

Next, we need to fit a logistic regression model to our data. We’ll use the `glm`

(Generalized Linear Model) function to do this. Suppose we want to predict the probability of a “success” (1) based on a single predictor variable “x.”

# Fit a logistic regression model model <- glm(y ~ x, data = df, family = binomial) broom::glance(model)

# A tibble: 1 × 8 null.deviance df.null logLik AIC BIC deviance df.residual nobs <dbl> <int> <dbl> <dbl> <dbl> <dbl> <int> <int> 1 138. 99 -51.5 107. 112. 103. 98 100

broom::tidy(model)

# A tibble: 2 × 5 term estimate std.error statistic p.value <chr> <dbl> <dbl> <dbl> <dbl> 1 (Intercept) -2.63 0.571 -4.60 0.00000422 2 x 0.505 0.102 4.96 0.000000699

head(broom::augment(model), 1) |> dplyr::glimpse()

Rows: 1 Columns: 8 $ y <int> 0 $ x <dbl> 2.875775 $ .fitted <dbl> -1.175925 $ .resid <dbl> -0.7333581 $ .hat <dbl> 0.01969748 $ .sigma <dbl> 1.028093 $ .cooksd <dbl> 0.003162007 $ .std.resid <dbl> -0.7406892

# Predict Probabilities

Now that we have our model, we can use it to predict probabilities. We’ll create a sequence of values for our predictor variable, and for each value, we’ll predict the probability of success, in this case `y`

.

# Create a sequence of predictor values x_seq <- seq(0, 10, 0.01) # Predict probabilities probabilities <- predict( model, newdata = data.frame(x = x_seq), type = "response" ) head(x_seq)

[1] 0.00 0.01 0.02 0.03 0.04 0.05

head(probabilities)

1 2 3 4 5 6 0.06732923 0.06764710 0.06796636 0.06828702 0.06860908 0.06893255

The `predict`

function here calculates the probabilities using our logistic regression model.

# Plot the Logistic Regression Curve

Finally, let’s plot the logistic regression curve. We’ll use the `plot`

function to create a scatter plot of the data points, and then we’ll overlay the logistic curve using the `lines`

function.

# Plot the data points plot( df$x, df$y, pch = 16, col = "blue", xlab = "Predictor Variable", ylab = "Probability of Success" ) # Add the logistic regression curve lines(x_seq, probabilities, col = "red", lwd = 2)

And there you have it! You’ve successfully plotted a logistic regression curve in base R. The blue dots represent your data points, and the red curve is the logistic regression curve, showing how the probability of success changes with the predictor variable.

# Conclusion

I encourage you to try this out with your own dataset. Logistic regression is a powerful tool for modeling binary outcomes, and visualizing the curve helps you understand the relationship between your predictor variable and the probability of success. Experiment with different datasets and predictor variables to gain a deeper understanding of this essential statistical technique.

Remember, practice makes perfect, and the more you work with logistic regression in R, the more proficient you’ll become. Happy coding!

**leave a comment**for the author, please follow the link and comment on their blog:

**Steve's Data Tips and Tricks**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.