Exploring TidyAML: Simplifying Regression Analysis in R

[This article was first published on Steve's Data Tips and Tricks, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Introduction

If you’re a data enthusiast diving into the world of regression analysis in R, you’ve likely encountered the challenges of managing code complexity and juggling different modeling engines. The good news is that there’s a powerful tool to streamline your regression workflow – the tidyAML R package.

Getting Started with TidyAML

Before we dive into the script, let’s make sure you have the necessary libraries installed. Fire up your R console and install the tidyAML package along with its dependencies:

install.packages("tidyAML")

Now, let’s explore a script that leverages tidyAML for quick and efficient regression analysis. Here’s a breakdown of the key components:

Intial Setup

library(tidyAML)
library(tidyverse)
library(tidymodels)
library(multilevelmod)
library(earth)
library(randomForest)
library(rpart)
library(lightgbm)
library(baguette)
library(bonsai)
library(gee)

tidymodels_prefer()

Setting Up the Recipe

df <- mtcars
recipe <- recipe(mpg ~ ., data = df)

In this snippet, we’re creating a recipe for our regression analysis. The response variable (mpg) is modeled against all other variables in the mtcars dataset.

Fast Regression with TidyAML

fr_tbl <- fast_regression(
  .data = df,
  .rec_obj = recipe,
  .parsnip_fns = c("linear_reg", "mars", "bag_mars", "rand_forest",
                   "boost_tree", "bag_tree"),
  .parsnip_eng = c("lm", "gee", "glm", "gls", "earth", "rpart", "lightgbm")
)

This is where the magic happens. The fast_regression function performs regression using various modeling functions (linear_reg, mars, etc.) and engines (lm, gee, etc.) specified. It’s a versatile approach to quickly explore different models.

Visualizing Residuals

fr_tbl |>
  mutate(res = map(fitted_wflw, \(x) x |> 
                     broom::augment(new_data = df))) |>
  unnest(cols = res) |>
  mutate(pfe = paste0(.parsnip_engine, " - ", .parsnip_fns)) |>
  mutate(.res = mpg - .pred) |>
  ggplot(aes(x = pfe, y = .res, fill = pfe)) +
    geom_boxplot() +
    theme_minimal() +
    labs(title = "Residuals by Fitted Model",
       subtitle = "Residuals are mpg - .pred",
       x = "Model",
       y = "Residuals",
       fill = "Engine + Function") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

This block of code generates a boxplot visualizing residuals by model. Residuals are the differences between observed and predicted values. The plot helps you assess how well your models are performing.

Try It Yourself!

Now that you’ve seen the power of tidyAML in action, it’s time to try it yourself. Install the package, load your data, and adapt the script to your specific use case. TidyAML provides a clean and efficient way to explore different regression models, making your analysis more manageable and insightful.

install.packages("tidyAML")
library(tidyAML)
# Your data loading and analysis code here

Happy coding, and may your regression analyses be tidy and insightful!

To leave a comment for the author, please follow the link and comment on their blog: Steve's Data Tips and Tricks.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)