Exploring the Peaks: A Dive into the Triangular Distribution in TidyDensity

Steven P. Sanderson II, MPH

13 hours ago

[This article was first published on Steve's Data Tips and Tricks, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

< section id="introduction" class="level1">

Introduction

Welcome back, fellow data enthusiasts! Today, we embark on an exciting journey into the world of statistical distributions with a special focus on the latest addition to the TidyDensity package – the triangular distribution. Tightly packed and versatile, this distribution brings a unique flavor to your data simulations and analyses. In this blog post, we’ll delve into the functions provided, understand their arguments, and explore the wonders of the triangular distribution.

< section id="whats-so-special-about-triangular-distributions" class="level1">

What’s So Special About Triangular Distributions?

Flexibility in uncertainty: They model situations where you have a minimum, maximum, and most likely value, but the exact distribution between those points is unknown.
Common in real-world scenarios: Project cost estimates, task completion times, expert opinions, and even natural phenomena often exhibit triangular patterns.
Simple to understand and visualize: Their straightforward shape makes them accessible for interpretation and communication.

The triangular distribution is a continuous probability distribution with lower limit a, upper limit b, and mode c, where a < b and a ≤ c ≤ b. The distribution resembles a tent shape.

The probability density function of the triangular distribution is:

f(x) = 
    (2(x - a)) / ((b - a)(c - a))  for a ≤ x ≤ c
    (2(b - x)) / ((b - a)(b - c))  for c ≤ x ≤ b

The key parameters of the triangular distribution are:

a – the minimum value
b – the maximum value
c – the mode (most frequent value)

The triangular distribution is often used as a subjective description of a population for which there is only limited sample data. It is useful when a process has a natural minimum and maximum.

< section id="triangular-functions" class="level1">

Triangular Functions

TidyDensity’s Triangular Distribution Functions: Let’s start by introducing the main functions for the triangular distribution:

tidy_triangular(): This function generates a triangular distribution with a specified number of simulations, minimum, maximum, and mode values.
- .n: Specifies the number of x values for each simulation.
- .min: Sets the minimum value of the triangular distribution.
- .max: Determines the maximum value of the triangular distribution.
- .mode: Specifies the mode (peak) value of the triangular distribution.
- .num_sims: Controls the number of simulations to perform.
- .return_tibble: A logical value indicating whether to return the result as a tibble.
util_triangular_param_estimate(): This function estimates the parameters of a triangular distribution from a tidy data frame.
- .x: Requires a numeric vector, with all values satisfying 0 <= x <= 1.
- .auto_gen_empirical: A boolean value (TRUE/FALSE) with a default set to TRUE. It automatically generates tidy_empirical() output for the .x parameter and utilizes tidy_combine_distributions().
util_triangular_stats_tbl(): This function creates a tidy data frame with statistics for a triangular distribution.
- .data: The data being passed from a tidy_ distribution function.
triangle_plot(): This function creates a ggplot2 object for a triangular distribution.
- .data: Tidy data from the tidy_triangular function.
- .interactive: A logical value indicating whether to return an interactive plot using plotly. Default is FALSE.

< section id="using-tidy_triangular-for-simulations" class="level2">

Using tidy_triangular for Simulations

Suppose you want to simulate a triangular distribution with 100 x values, a minimum of 0, a maximum of 1, and a mode at 0.5. You’d use the following code:

library(TidyDensity)

triangular_data <- tidy_triangular(
  .n = 100, 
  .min = 0, 
  .max = 1, 
  .mode = 0.5, 
  .num_sims = 1, 
  .return_tibble = TRUE
  )

triangular_data

# A tibble: 100 × 7
   sim_number     x     y      dx      dy     p     q
   <fct>      <int> <dbl>   <dbl>   <dbl> <dbl> <dbl>
 1 1              1 0.853 -0.140  0.00158 0.957 0.853
 2 1              2 0.697 -0.128  0.00282 0.816 0.697
 3 1              3 0.656 -0.116  0.00484 0.764 0.656
 4 1              4 0.518 -0.103  0.00805 0.536 0.518
 5 1              5 0.635 -0.0909 0.0130  0.733 0.635
 6 1              6 0.838 -0.0786 0.0202  0.948 0.838
 7 1              7 0.645 -0.0662 0.0304  0.748 0.645
 8 1              8 0.482 -0.0539 0.0444  0.464 0.482
 9 1              9 0.467 -0.0416 0.0627  0.437 0.467
10 1             10 0.599 -0.0293 0.0859  0.678 0.599
# ℹ 90 more rows

This generates a tidy tibble with simulated data, ready for your analysis.

< section id="estimating-parameters-and-creating-stats-tables" class="level2">

Estimating Parameters and Creating Stats Tables

Utilize the util_triangular_param_estimate function to estimate parameters and create tidy empirical data:

param_estimate <- util_triangular_param_estimate(.x = triangular_data$y)

t(param_estimate$parameter_tbl)

          [,1]        
dist_type "Triangular"
samp_size "100"       
min       "0.0572515" 
max       "0.8822025" 
mode      "0.8822025" 
method    "Basic"

For statistics table creation:

stats_table <- util_triangular_stats_tbl(.data = triangular_data)
t(stats_table)

                  [,1]                     
tidy_function     "tidy_triangular"        
function_call     "Triangular c(0, 1, 0.5)"
distribution      "Triangular"             
distribution_type "continuous"             
points            "100"                    
simulations       "1"                      
mean              "0.5"                    
median            "0.3535534"              
mode              "1"                      
range_low         "0.0572515"              
range_high        "0.8822025"              
variance          "0.04166667"             
skewness          "0"                      
kurtosis          "-0.6"                   
entropy           "-0.6931472"             
computed_std_skew "-0.1870017"             
computed_std_kurt "2.778385"               
ci_lo             "0.08311609"             
ci_hi             "0.8476985"

Visualizing the Triangular Distribution: Now, let’s visualize the triangular distribution using the triangle_plot function:

triangle_plot(.data = triangular_data, .interactive = TRUE)

triangle_plot(.data = triangular_data, .interactive = FALSE)

This will generate an informative plot, and if you set .interactive to TRUE, you can explore the distribution interactively using plotly.

< section id="conclusion" class="level1">

Conclusion

In this blog post, we’ve explored the powerful functionalities of the triangular distribution in TidyDensity. Whether you’re simulating data, estimating parameters, or creating insightful visualizations, these functions provide a robust toolkit for your statistical endeavors. Happy coding, and may your distributions always be tidy!

To leave a comment for the author, please follow the link and comment on their blog: Steve's Data Tips and Tricks.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.