Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

# Introduction

Data visualization is a powerful tool in a data scientist’s toolkit. It not only helps us understand our data but also presents it in a way that is easy to comprehend. In this blog post, we will explore how to plot predicted values in R using the mtcars dataset. We will train a simple regression model to predict the miles per gallon (mpg) of cars based on their attributes and then visualize the predictions. By the end of this tutorial, you’ll have a clear understanding of how to plot predicted values and can apply this knowledge to your own data analysis projects.

Step 1: Load the Required Libraries

Before we dive into the code, let’s make sure we have the necessary libraries installed. We’ll be using `ggplot2` for plotting and `caret` for model training and evaluation. You can install them if you haven’t already using:

```install.packages("ggplot2")
install.packages("caret")```

```library(ggplot2)
library(caret)```

Step 2: Load and Explore the Data

We’ll use the classic `mtcars` dataset, which contains various attributes of different car models. Our goal is to predict the fuel efficiency (mpg) of these cars. Let’s load and explore the dataset:

`head(mtcars)`
```                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1```

This will display the first few rows of the dataset, giving you an idea of what it looks like.

Step 3: Split the Data into Training and Testing Sets

Before we proceed with modeling and prediction, we need to split our data into training and testing sets. We’ll use 80% of the data for training and the remaining 20% for testing:

```set.seed(123)  # for reproducibility
splitIndex <- createDataPartition(mtcars\$mpg, p = 0.8, list = FALSE)
training_data <- mtcars[splitIndex, ]
testing_data <- mtcars[-splitIndex, ]```

Step 4: Build a Simple Linear Regression Model

Now, let’s build a simple linear regression model to predict `mpg` based on other attributes. We’ll use the `lm()` function:

`model <- lm(mpg ~ ., data = training_data)`

This line of code fits the linear regression model using the training data.

Step 5: Make Predictions

With our model trained, we can now make predictions on the testing data:

`predictions <- predict(model, newdata = testing_data)`

Step 6: Create a Scatter Plot of Predicted vs. Actual Values

The most exciting part is visualizing the predicted values. We can do this using a scatter plot. Let’s create one:

```# Combine actual and predicted values
plot_data <- data.frame(Actual = testing_data\$mpg, Predicted = predictions)

# Create a scatter plot
ggplot(plot_data, aes(x = Actual, y = Predicted)) +
geom_point() +
geom_smooth(method = "lm", formula = y ~ x, color = "red") +
labs(
x = "Actual MPG",
y = "Predicted MPG",
title = "Actual vs. Predicted MPG"
) +
theme_minimal()```

This code generates a scatter plot with the actual MPG values on the x-axis and predicted MPG values on the y-axis. The red line represents a linear regression line that helps us see how well our predictions align with the actual data.

Here is how we also plot the data in base R.

```# Combine actual and predicted values
plot_data <- data.frame(Actual = testing_data\$mpg, Predicted = predictions)

# Create a scatter plot
plot(plot_data\$Actual, plot_data\$Predicted,
xlab = "Actual MPG", ylab = "Predicted MPG",
main = "Actual vs. Predicted MPG",
pch = 19, col = "blue")

abline(lm(Predicted ~ Actual, data = plot_data), col = "red")```

# Conclusion

Congratulations! You’ve successfully learned how to plot predicted values in R using the mtcars dataset. Visualization is a vital part of data analysis, and it can provide valuable insights into the performance of your predictive models.

I encourage you to try this on your own datasets and explore more advanced visualization techniques. Experiment with different models and datasets to gain a deeper understanding of data visualization in R. Happy coding!