Site icon R-bloggers

TidyDensity Powers Up with Data.table: Speedier Distributions for Your Data Exploration

[This article was first published on Steve's Data Tips and Tricks, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
< section id="calling-all-r-enthusiasts-who-love-tidy-data-and-crave-efficiency" class="level1">

Calling all R enthusiasts who love tidy data and crave efficiency!

I’m thrilled to announce a major upgrade to the TidyDensity package that’s sure to accelerate your data analysis workflows. We’ve integrated the lightning-fast data.table package for generating tidy distribution data, resulting in a jaw-dropping 30% speed boost.

Here is one of the tests ran during development where v1 was the current and v2 was the version using data.table:

n <- 10000
benchmark(
 "tidy_bernoulli_v2" = {
   tidy_bernoulli_v2(n, .5, 1, FALSE)
 },
 "tidy_bernoulli_v1" = {
   TidyDensity::tidy_bernoulli(n, .5, 1)
 },
 replications = 100,
 columns = c("test","replications","elapsed","relative","user.self","sys.self")
) |>
 arrange(relative)
               test replications elapsed relative user.self sys.self
1 tidy_bernoulli_v2          100    2.50    1.000      2.22     0.26
2 tidy_bernoulli_v1          100    4.67    1.868      4.34     0.31
< section id="heres-what-this-means-for-you" class="level1">

Here’s what this means for you

< section id="how-to-experience-this-boost" class="level1">

How to experience this boost

  1. Update TidyDensity: Ensure you have the latest version installed: install.packages("TidyDensity")

  2. Choose Your Output Format: Indicate your preference with the .return_tibble parameter:

    # For a tibble:
    tidy_data <- tidy_normal(.return_tibble = TRUE)
    
    # For a data.table:
    tidy_data <- tidy_normal(.return_tibble = FALSE)

    No matter which output you choose you will still enjoy the speedup because data.table is used to create the data and the conversion to a tibble is done afterwards if that is the output you want.

< section id="lets-see-the-output" class="level1">

Let’s see the output

library(TidyDensity)

# Generate data
normal_tibble <- tidy_normal(.return_tibble = TRUE)
head(normal_tibble)
# A tibble: 6 × 7
  sim_number     x       y    dx       dy      p       q
  <fct>      <int>   <dbl> <dbl>    <dbl>  <dbl>   <dbl>
1 1              1  1.05   -2.97 0.000398 0.854   1.05  
2 1              2  0.0168 -2.84 0.00104  0.507   0.0168
3 1              3  1.77   -2.72 0.00244  0.961   1.77  
4 1              4 -1.81   -2.59 0.00518  0.0353 -1.81  
5 1              5  0.447  -2.46 0.00997  0.673   0.447 
6 1              6  1.05   -2.33 0.0174   0.854   1.05  
class(normal_tibble)
[1] "tbl_df"     "tbl"        "data.frame"
normal_dt <- tidy_normal(.return_tibble = FALSE)
head(normal_dt)
   sim_number x           y        dx           dy         p           q
1:          1 1  2.24103518 -3.424949 0.0002787401 0.9874881  2.24103518
2:          1 2 -0.12769603 -3.286892 0.0008586864 0.4491948 -0.12769603
3:          1 3 -0.39666069 -3.148835 0.0022824304 0.3458088 -0.39666069
4:          1 4  0.89626001 -3.010778 0.0052656793 0.8149430  0.89626001
5:          1 5  0.04267757 -2.872721 0.0105661984 0.5170207  0.04267757
6:          1 6  0.53424808 -2.734664 0.0185083421 0.7034150  0.53424808
class(normal_dt)
[1] "data.table" "data.frame"
< section id="ready-to-unleash-the-power-of-tidydensity-and-data.table" class="level1">

Ready to unleash the power of TidyDensity and data.table?

Dive into your next data exploration project and experience the efficiency firsthand! Share your discoveries and feedback with the community—we’re eager to hear how this upgrade empowers your analysis.

Happy tidy data exploration!

To leave a comment for the author, please follow the link and comment on their blog: Steve's Data Tips and Tricks.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Exit mobile version