Site icon R-bloggers

Demystifying bootstrap_stat_plot(): Your Ticket to Insightful Data Exploration

[This article was first published on Steve's Data Tips and Tricks, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
< section id="introduction" class="level1">

Introduction

Ever feel like your data is hiding secrets? Like it’s whispering truths but you just can’t quite grasp them? Well, fear not, fellow data sleuths! Today, we’ll crack the code of an R function that’s like a magnifying glass for your statistical investigations: bootstrap_stat_plot() from the TidyDensity package.

Imagine this: You have a dataset, say, car mileage (MPG) from the classic mtcars dataset. You want to understand the average MPG, but what if that average is just a mirage? What if it’s skewed by a few outliers or doesn’t capture the full story?

Enter bootstrapping, a statistical technique that’s like taking your data on a wild ride. It creates multiple copies of your data, each with a slight twist, and then calculates the statistic you’re interested in (e.g., average MPG) for each copy. This gives you a distribution of possible averages, revealing the variability and potential biases lurking beneath the surface.

bootstrap_stat_plot() takes this magic a step further. It not only calculates the distribution but also visualizes it, giving you a clear picture of how the statistic fluctuates across different versions of your data. It’s like a magnifying glass for your statistical investigations!

< section id="function" class="level1">

Function

< section id="syntax" class="level2">

Syntax

Let’s take a look at the function:

bootstrap_stat_plot(
  .data,
  .value,
  .stat = "cmean",
  .show_groups = FALSE,
  .show_ci_labels = TRUE,
  .interactive = FALSE
)
< section id="arguments" class="level2">

Arguments

1. The Data:

2. The Value:

3. The Statistic:

4. Show Groups:

5. Show Confidence Interval Labels:

6. Interactive:

< section id="examples" class="level1">

Examples

< section id="example-1---show-replications" class="level1">

Example 1 – Show replications

library(TidyDensity)
library(patchwork)

x <- mtcars$mpg
ns <- 50

p1 <- tidy_bootstrap(x, .num_sims = ns) |>
  bootstrap_stat_plot(y,
                      .stat = "cmean", 
                      .show_groups = TRUE,
                      .show_ci_label = TRUE
  ) 

p2 <- tidy_bootstrap(x, .num_sims = ns) |> 
  bootstrap_stat_plot(y,
                      .stat = "cmin", 
                      .show_groups = TRUE,
                      .show_ci_label = TRUE
  )

p3 <- tidy_bootstrap(x, .num_sims = ns) |>
  bootstrap_stat_plot(y,
                      .stat = "cmax", 
                      .show_groups = TRUE,
                      .show_ci_label = TRUE
  )

p4 <- tidy_bootstrap(x, .num_sims = ns) |>
  bootstrap_stat_plot(y,
                      .stat = "csd", 
                      .show_groups = TRUE,
                      .show_ci_label = TRUE
  )

wrap_plots(
  p1, p2, p4, p3, 
  ncol = 2, nrow = 2, 
  widths = c(1, 1), heights = c(1, 1)
  )

Let’s dissect the code to see how it works:

1. The Data:

2. The Statistic:

3. Visualization Options:

4. Interactive Mode:

< section id="example-2---hide-replications" class="level1">

Example 2 – Hide replications

p1 <- tidy_bootstrap(x) |>
  bootstrap_stat_plot(y,
                      .stat = "cmean", 
                      .show_groups = FALSE,
                      .show_ci_label = FALSE
  )

p2 <- tidy_bootstrap(x) |>
  bootstrap_stat_plot(y,
                      .stat = "cmin", 
                      .show_groups = FALSE,
                      .show_ci_label = FALSE
  )

p3 <- tidy_bootstrap(x) |>
  bootstrap_stat_plot(y,
                      .stat = "cmax", 
                      .show_groups = FALSE,
                      .show_ci_label = FALSE
  )

p4 <- tidy_bootstrap(x) |>
  bootstrap_stat_plot(y,
                      .stat = "csd", 
                      .show_groups = FALSE,
                      .show_ci_label = FALSE
  )

wrap_plots(
  p1, p2, p4, p3, 
  ncol = 2, nrow = 2, 
  widths = c(1, 1), heights = c(1, 1)
)

In this example we did two things different, we hid the replications, the simulations was left to the default of 2000 and the labels were turned off. This is useful when you want to show a summary of the data.

< section id="your-turn-to-explore" class="level1">

Your Turn to Explore

Don’t just take our word for it! Try bootstrap_stat_plot() on your own data. Experiment with different statistics, explore the interactive mode, and see how it unlocks new insights you might have missed before. Remember, the more you play, the more you discover!

So, unleash your inner data detective and let bootstrap_stat_plot() guide you to a deeper understanding of your data. Happy exploring!

To leave a comment for the author, please follow the link and comment on their blog: Steve's Data Tips and Tricks.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Exit mobile version