Exploring the Peaks: A Dive into the Triangular Distribution in TidyDensity

[This article was first published on Steve's Data Tips and Tricks, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Introduction

Welcome back, fellow data enthusiasts! Today, we embark on an exciting journey into the world of statistical distributions with a special focus on the latest addition to the TidyDensity package – the triangular distribution. Tightly packed and versatile, this distribution brings a unique flavor to your data simulations and analyses. In this blog post, we’ll delve into the functions provided, understand their arguments, and explore the wonders of the triangular distribution.

What’s So Special About Triangular Distributions?

  • Flexibility in uncertainty: They model situations where you have a minimum, maximum, and most likely value, but the exact distribution between those points is unknown.
  • Common in real-world scenarios: Project cost estimates, task completion times, expert opinions, and even natural phenomena often exhibit triangular patterns.
  • Simple to understand and visualize: Their straightforward shape makes them accessible for interpretation and communication.

The triangular distribution is a continuous probability distribution with lower limit a, upper limit b, and mode c, where a < b and a ≤ c ≤ b. The distribution resembles a tent shape.

The probability density function of the triangular distribution is:

f(x) = 
    (2(x - a)) / ((b - a)(c - a))  for a ≤ x ≤ c
    (2(b - x)) / ((b - a)(b - c))  for c ≤ x ≤ b

The key parameters of the triangular distribution are:

  • a – the minimum value
  • b – the maximum value
  • c – the mode (most frequent value)

The triangular distribution is often used as a subjective description of a population for which there is only limited sample data. It is useful when a process has a natural minimum and maximum.

Triangular Functions

TidyDensity’s Triangular Distribution Functions: Let’s start by introducing the main functions for the triangular distribution:

  1. tidy_triangular(): This function generates a triangular distribution with a specified number of simulations, minimum, maximum, and mode values.
    • .n: Specifies the number of x values for each simulation.
    • .min: Sets the minimum value of the triangular distribution.
    • .max: Determines the maximum value of the triangular distribution.
    • .mode: Specifies the mode (peak) value of the triangular distribution.
    • .num_sims: Controls the number of simulations to perform.
    • .return_tibble: A logical value indicating whether to return the result as a tibble.
  2. util_triangular_param_estimate(): This function estimates the parameters of a triangular distribution from a tidy data frame.
    • .x: Requires a numeric vector, with all values satisfying 0 <= x <= 1.
    • .auto_gen_empirical: A boolean value (TRUE/FALSE) with a default set to TRUE. It automatically generates tidy_empirical() output for the .x parameter and utilizes tidy_combine_distributions().
  3. util_triangular_stats_tbl(): This function creates a tidy data frame with statistics for a triangular distribution.
    • .data: The data being passed from a tidy_ distribution function.
  4. triangle_plot(): This function creates a ggplot2 object for a triangular distribution.
    • .data: Tidy data from the tidy_triangular function.
    • .interactive: A logical value indicating whether to return an interactive plot using plotly. Default is FALSE.

Using tidy_triangular for Simulations

Suppose you want to simulate a triangular distribution with 100 x values, a minimum of 0, a maximum of 1, and a mode at 0.5. You’d use the following code:

library(TidyDensity)

triangular_data <- tidy_triangular(
  .n = 100, 
  .min = 0, 
  .max = 1, 
  .mode = 0.5, 
  .num_sims = 1, 
  .return_tibble = TRUE
  )

triangular_data
# A tibble: 100 × 7
   sim_number     x     y      dx      dy     p     q
   <fct>      <int> <dbl>   <dbl>   <dbl> <dbl> <dbl>
 1 1              1 0.853 -0.140  0.00158 0.957 0.853
 2 1              2 0.697 -0.128  0.00282 0.816 0.697
 3 1              3 0.656 -0.116  0.00484 0.764 0.656
 4 1              4 0.518 -0.103  0.00805 0.536 0.518
 5 1              5 0.635 -0.0909 0.0130  0.733 0.635
 6 1              6 0.838 -0.0786 0.0202  0.948 0.838
 7 1              7 0.645 -0.0662 0.0304  0.748 0.645
 8 1              8 0.482 -0.0539 0.0444  0.464 0.482
 9 1              9 0.467 -0.0416 0.0627  0.437 0.467
10 1             10 0.599 -0.0293 0.0859  0.678 0.599
# ℹ 90 more rows

This generates a tidy tibble with simulated data, ready for your analysis.

Estimating Parameters and Creating Stats Tables

Utilize the util_triangular_param_estimate function to estimate parameters and create tidy empirical data:

param_estimate <- util_triangular_param_estimate(.x = triangular_data$y)

t(param_estimate$parameter_tbl)
          [,1]        
dist_type "Triangular"
samp_size "100"       
min       "0.0572515" 
max       "0.8822025" 
mode      "0.8822025" 
method    "Basic"     

For statistics table creation:

stats_table <- util_triangular_stats_tbl(.data = triangular_data)
t(stats_table)
                  [,1]                     
tidy_function     "tidy_triangular"        
function_call     "Triangular c(0, 1, 0.5)"
distribution      "Triangular"             
distribution_type "continuous"             
points            "100"                    
simulations       "1"                      
mean              "0.5"                    
median            "0.3535534"              
mode              "1"                      
range_low         "0.0572515"              
range_high        "0.8822025"              
variance          "0.04166667"             
skewness          "0"                      
kurtosis          "-0.6"                   
entropy           "-0.6931472"             
computed_std_skew "-0.1870017"             
computed_std_kurt "2.778385"               
ci_lo             "0.08311609"             
ci_hi             "0.8476985"              

Visualizing the Triangular Distribution: Now, let’s visualize the triangular distribution using the triangle_plot function:

triangle_plot(.data = triangular_data, .interactive = TRUE)
triangle_plot(.data = triangular_data, .interactive = FALSE)

This will generate an informative plot, and if you set .interactive to TRUE, you can explore the distribution interactively using plotly.

Conclusion

In this blog post, we’ve explored the powerful functionalities of the triangular distribution in TidyDensity. Whether you’re simulating data, estimating parameters, or creating insightful visualizations, these functions provide a robust toolkit for your statistical endeavors. Happy coding, and may your distributions always be tidy!

To leave a comment for the author, please follow the link and comment on their blog: Steve's Data Tips and Tricks.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)