Generating Synthetic Data with R-vine Copulas using esgtoolkit in R

Posted on September 20, 2025 by T. Moudiki in R bloggers | 0 Comments

[This article was first published on T. Moudiki's Webpage - R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-vine copulas are powerful tools for modeling complex dependencies among multiple variables. The esgtoolkit package in R provides a user-friendly interface to fit R-vine copula models and generate synthetic data that preserves the statistical properties of the original dataset.

See also:

devtools::install_github("Techtonique/esgtoolkit")


library(esgtoolkit)

y <- esgtoolkit::calculatereturns(ts(EuStockMarkets[1:250, ], start=start(EuStockMarkets),
                                     frequency=frequency(EuStockMarkets)), type = "log")

# Run simulation
result <- simulate_rvine(y, n = 500, verbose = TRUE, n_trials = 5)

# Print summary
print(result)

# Create different types of plots
plot(result, type = "distribution")  # Default

plot(result, type = "correlation")

#plot(result, type = "both")

# Access detailed diagnostics
str(result$diagnostics)

# Access simulated data
sim_data <- result$simulated_data

head(sim_data)


    Transforming data to uniform margins with improved boundary handling...
    
    Fitting R-vine copula model...
    
    V1 + V3 --> V1,V3 ; V2
    
    V2 + V4 --> V2,V4 ; V3
    
    V1 + V4 --> V1,V4 ; V3,V2
    
    R-vine copula model fitted successfully
    


    tree     edge | family   cop   par  par2 |  tau   utd   ltd 
    ----------------------------------------------------------- 
       1      2,1 |     19  SBB7  2.03  0.69 | 0.47  0.36  0.59
              3,2 |     19  SBB7  1.74  0.79 | 0.44  0.41  0.51
              4,3 |      1     N  0.63  0.00 | 0.43     -     -
       2    3,1;2 |      1     N  0.33  0.00 | 0.21     -     -
            4,2;3 |      1     N  0.30  0.00 | 0.19     -     -
       3  4,1;3,2 |     14    SG  1.07  0.00 | 0.06     -  0.09
    ---
    type: D-vine    logLik: 249.86    AIC: -483.71    BIC: -455.57    
    ---
    1 <-> V1,   2 <-> V2,   3 <-> V3,   4 <-> V4  tree    edge family  cop       par      par2        tau       utd        ltd
    1    1     4,3      1    N 0.6297866 0.0000000 0.43371534 0.0000000 0.00000000
    2    1     3,2     19 SBB7 1.7362736 0.7864981 0.43642959 0.4142407 0.50934532
    3    1     2,1     19 SBB7 2.0325295 0.6856104 0.46867955 0.3638576 0.59360896
    4    2   4,2;3      1    N 0.2984383 0.0000000 0.19293140 0.0000000 0.00000000
    5    2   3,1;2      1    N 0.3282026 0.0000000 0.21288575 0.0000000 0.00000000
    6    3 4,1;3,2     14   SG 1.0683573 0.0000000 0.06398351 0.0000000 0.08676182


    Running 5 simulation trials...
    
    Best simulation achieved quality score: 0.0984
    
    Score weights used: [0.4, 0.2, 0.2, 0.1, 0.1]
    
    Mean absolute correlation error (Kendall): 0.0077
    
    Mean absolute correlation error (Pearson): 0.0428
    


    R-vine Copula Simulation Results
    ================================
    
    Original observations: 249
    Variables: 4
    Simulated observations: 500
    Quality score: 0.0984
    Successful trials: 5/5
    Mean absolute correlation error (Kendall): 0.0077
    Mean absolute correlation error (Pearson): 0.0428
    
    Use plot() to visualize results and $diagnostics for detailed metrics.

    List of 24
     $ original_correlation_tau     : num [1:4, 1:4] 1 0.465 0.413 0.353 0.465 ...
      ..- attr(*, "dimnames")=List of 2
      .. ..$ : chr [1:4] "DAX" "SMI" "CAC" "FTSE"
      .. ..$ : chr [1:4] "DAX" "SMI" "CAC" "FTSE"
     $ simulated_correlation_tau    : num [1:4, 1:4] 1 0.45 0.421 0.355 0.45 ...
      ..- attr(*, "dimnames")=List of 2
      .. ..$ : chr [1:4] "DAX" "SMI" "CAC" "FTSE"
      .. ..$ : chr [1:4] "DAX" "SMI" "CAC" "FTSE"
     $ correlation_error_tau        : num [1:4, 1:4] 0 -0.01499 0.00825 0.00173 -0.01499 ...
      ..- attr(*, "dimnames")=List of 2
      .. ..$ : chr [1:4] "DAX" "SMI" "CAC" "FTSE"
      .. ..$ : chr [1:4] "DAX" "SMI" "CAC" "FTSE"
     $ original_correlation_pearson : num [1:4, 1:4] 1 0.815 0.728 0.507 0.815 ...
      ..- attr(*, "dimnames")=List of 2
      .. ..$ : chr [1:4] "DAX" "SMI" "CAC" "FTSE"
      .. ..$ : chr [1:4] "DAX" "SMI" "CAC" "FTSE"
     $ simulated_correlation_pearson: num [1:4, 1:4] 1 0.766 0.682 0.404 0.766 ...
      ..- attr(*, "dimnames")=List of 2
      .. ..$ : chr [1:4] "DAX" "SMI" "CAC" "FTSE"
      .. ..$ : chr [1:4] "DAX" "SMI" "CAC" "FTSE"
     $ correlation_error_pearson    : num [1:4, 1:4] 0 -0.0483 -0.0454 -0.1027 -0.0483 ...
      ..- attr(*, "dimnames")=List of 2
      .. ..$ : chr [1:4] "DAX" "SMI" "CAC" "FTSE"
      .. ..$ : chr [1:4] "DAX" "SMI" "CAC" "FTSE"
     $ mean_absolute_error_tau      : num 0.0077
     $ max_absolute_error_tau       : num 0.0232
     $ mean_absolute_error_pearson  : num 0.0428
     $ max_absolute_error_pearson   : num 0.103
     $ quality_score                : num 0.0984
     $ score_weights_used           : num [1:5] 0.4 0.2 0.2 0.1 0.1
     $ trial_scores                 : num [1:5] 0.219 0.1222 0.0984 0.1231 0.1884
     $ successful_trials            : int 5
     $ RVM_model                    :List of 20
      ..$ Matrix     : num [1:4, 1:4] 1 4 3 2 0 2 4 3 0 0 ...
      ..$ family     : num [1:4, 1:4] 0 14 1 19 0 0 1 19 0 0 ...
      ..$ par        : num [1:4, 1:4] 0 1.068 0.328 2.033 0 ...
      ..$ par2       : num [1:4, 1:4] 0 0 0 0.686 0 ...
      ..$ names      : chr [1:4] "V1" "V2" "V3" "V4"
      ..$ MaxMat     : num [1:4, 1:4] 1 2 2 2 0 2 3 3 0 0 ...
      ..$ CondDistr  :List of 2
      .. ..$ direct  : logi [1:4, 1:4] FALSE TRUE TRUE TRUE FALSE FALSE ...
      .. ..$ indirect: logi [1:4, 1:4] FALSE FALSE FALSE FALSE FALSE TRUE ...
      ..$ type       : chr "D-vine"
      ..$ tau        : num [1:4, 1:4] 0 0.064 0.213 0.469 0 ...
      ..$ taildep    :List of 2
      .. ..$ upper: num [1:4, 1:4] 0 0 0 0.364 0 ...
      .. ..$ lower: num [1:4, 1:4] 0 0.0868 0 0.5936 0 ...
      ..$ beta       : num [1:4, 1:4] 0 0.062 0.213 0.443 0 ...
      ..$ call       : language VineCopula::RVineStructureSelect(data = U, familyset = valid_families,      type = 0, selectioncrit = "BIC", trun| __truncated__ ...
      ..$ nobs       : int 249
      ..$ logLik     : num 250
      ..$ pair.logLik: num [1:4, 1:4] 0 2.59 14.33 84.82 0 ...
      ..$ AIC        : num -484
      ..$ pair.AIC   : num [1:4, 1:4] 0 -3.18 -26.66 -165.65 0 ...
      ..$ BIC        : num -456
      ..$ pair.BIC   : num [1:4, 1:4] 0 0.341 -23.138 -158.615 0 ...
      ..$ emptau     : num [1:4, 1:4] 0 0.0639 0.2156 0.4653 0 ...
      ..- attr(*, "class")= chr "RVineMatrix"
     $ n_observations               : int 249
     $ n_variables                  : int 4
     $ n_simulations                : num 500
     $ original_means               : Named num [1:4] 0.000372 0.000456 0.000338 0.000255
      ..- attr(*, "names")= chr [1:4] "DAX" "SMI" "CAC" "FTSE"
     $ simulated_means              : Named num [1:4] -3.17e-05 2.18e-04 -4.52e-05 1.83e-04
      ..- attr(*, "names")= chr [1:4] "DAX" "SMI" "CAC" "FTSE"
     $ original_sds                 : Named num [1:4] 0.00931 0.00877 0.01049 0.00815
      ..- attr(*, "names")= chr [1:4] "DAX" "SMI" "CAC" "FTSE"
     $ simulated_sds                : Named num [1:4] 0.00954 0.00886 0.01119 0.00849
      ..- attr(*, "names")= chr [1:4] "DAX" "SMI" "CAC" "FTSE"
     $ ks_test_statistics           : Named num [1:4] 0.0259 0.0354 0.0507 0.04
      ..- attr(*, "names")= chr [1:4] "D" "D" "D" "D"
     $ ks_test_pvalues              : num [1:4] 1 0.985 0.786 0.953

A matrix: 6 × 4 of type dbl
DAX	SMI	CAC	FTSE
-0.001398544	-0.0015309795	0.003170410	0.0008758254
0.004458917	0.0026640098	0.011666435	0.0057322484
-0.001597764	-0.0001979084	-0.004832143	0.0011097778
-0.001501251	-0.0034774275	-0.003218613	0.0020315141
0.000000000	0.0045969506	0.001912011	-0.0044810433
-0.002419004	-0.0004654500	-0.004832588	-0.0055430360

To leave a comment for the author, please follow the link and comment on their blog: T. Moudiki's Webpage - R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Generating Synthetic Data with R-vine Copulas using esgtoolkit in R

Related

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)