Articles by rstats on Bryan Shalloway's Blog

Macros in the Shell: Integrating That Spreadsheet From Finance Into a Data Pipeline

May 9, 2021 |

Macro in the Shell Example Setting-up Gaurd Rails Closing Appendix Related Alternative Other Resources There is many a data science meme degrading excel: (Google Sheets seems to have escaped most of the memes here.) While I no longer use ...

Quantile Regression Forests for Prediction Intervals

April 20, 2021 |

Quantile Regression Example Quantile Regression Forest Review Performance Coverage Interval Width Closing Notes Appendix Residual Plots Other Charts In this post I will build prediction intervals using quantile regression, more specifically, quantile regression forests. This is my third post on prediction intervals. Prior posts: Understanding Prediction Intervals (Part 1) Simulating Prediction ...

Simulating Prediction Intervals

April 4, 2021 |

Rough Idea Inspiration Procedure Example Simulate Prediction Interval Review Interval Width Coverage Closing Notes Appendix Conformal Inference Other Examples Using Simulation Confusion With Confidence Intervals Adjusting Procedure Alternative Procedure With CV Part 1 of my series of posts on building prediction intervals used data held-out from model training to evaluate the ...

Understanding Prediction Intervals

March 17, 2021 |

Providing More Than Point Estimates Considering Uncertainty Observation Specific Intervals A Few Things to Know About Prediction Intervals Prediction Intervals and Confidence Intervals Analytic Method of Calculating Prediction Intervals Visual Comparison of Prediction Intervals and Confidence Intervals Inference or Prediction? Cautions With Overfitting Generalizability Review Prediction Intervals Coverage Interval Width ...

Weighting Confusion Matrices by Outcomes and Observations

December 7, 2020 |

Model Performance Metrics Lending Data Example Starter Code Weighting by Classification Outcomes Metrics Across Decision Thresholds Weighting by Observations Closing note Appendix Weights of Observations During and Prior to Modeling Notes on Cost Sensitive Classification Weighted Classification Metrics Questions on Cost Sensitive Classification Arriving at Weights Weighting in predictive modeling ...

Weighting Confusion Matrices by Outcomes and Observations

December 7, 2020 |

Model Performance Metrics Lending Data Example Starter Code Weighting by Classification Outcomes Metrics Across Decision Thresholds Weighting by Observations Closing note Appendix Weights of Observations During and Prior to Modeling Notes on Cost Sensitive Classification Weighted Classification Metrics Questions on Cost Sensitive Classification Arriving at Weights Weighting in predictive modeling ...

Undersampling Will Change the Base Rates of Your Model’s Predictions

November 22, 2020 |

Create Data Association of ‘feature’ and ‘target’ Resample Build Models Rescale Predictions to Predicted Probabilities Appendix Density Plots Lift Plot Comparing Scaling Methods TLDR: In classification problems, under and over sampling1 techniques shift the distribution of predicted probabilities towards the minority class. If your problem requires accurate probabilities you will ...

Undersampling Will Change the Base Rates of Your Model’s Predictions

November 22, 2020 |

Create Data Association of ‘feature’ and ‘target’ Resample Build Models Rescale Predictions to Predicted Probabilities Appendix Density Plots Lift Plot TLDR: In classification problems, under and over sampling1 techniques shift the distribution of predicted probabilities towards the minority class. If your problem requires accurate probabilities you will need to adjust ...

Feature Engineering with Sliding Windows and Lagged Inputs

October 11, 2020 |

Load data Feature Engineering & Data Splits Lag Based Features (Before Split, use dplyr or similar) Data Splits Other Features (After Split, use recipes) Model Specification and Training Model Evaluation Appendix Model Building with Hyperparam...

Feature Engineering with Sliding Windows and Lagged Inputs

October 11, 2020 |

Load data Feature Engineering & Data Splits Lag Based Features (Before Split, use dplyr or similar) Data Splits Other Features (After Split, use recipes) Model Specification and Training Model Evaluation Appendix Model Building with Hyperparam...

Short Examples of Best Practices When Writing Functions That Call dplyr Verbs

June 24, 2020 |

Function expecting one column Functions allowing multiple columns Older approaches Appendix dplyr, the foundational tidyverse package, makes a trade-off between being easy to code in interactively at the expense of being more difficult to create...

Short Examples of Best Practices When Writing Functions That Call dplyr Verbs

June 24, 2020 |

Function expecting one column Functions allowing multiple columns Older approaches Appendix dplyr, the foundational tidyverse package, makes a trade-off between being easy to code in interactively at the expense of being more difficult to create...

Use Flipbooks to Explain Your Code and Thought Process

June 23, 2020 |

Learning R’s %__% Using the pipe operator (%__%) is one of my favorite things about coding in R and the tidyverse. However when it was first shown to me, I couldn’t understand what the #rstats nut describing it was so enthusiastic about. They t... [Read more...]

Use Flipbooks to Explain Your Code and Thought Process

June 23, 2020 |

Learning R’s %__% Using the pipe operator (%__%) is one of my favorite things about coding in R and the tidyverse. However when it was first shown to me, I couldn’t understand what the #rstats nut describing it was so enthusiastic about. They t... [Read more...]

Tidy Pairwise Operations

June 2, 2020 |

Overview I. Nest and pivot II. Expand combinations III. Filter redundancies IV. Map function(s) V. Return to normal dataframe VI. Bind back to data Functionalize Example creating & evaluating features When is this approach inappropriate? Appen...

Tidy Pairwise Operations

June 2, 2020 |

Overview I. Nest and pivot II. Expand combinations III. Filter redundancies IV. Map function(s) V. Return to normal dataframe VI. Bind back to data Functionalize Example creating & evaluating features When is this approach inappropriate? Appen...

Riddler Solutions: Pedestrian Puzzles

March 3, 2020 |

Riddler express Riddler classic Appendix Time to center Transform grid, rotate first Transform city, pretty This post contains solutions to FiveThirtyEight’s two riddles released 2020-02-14, Riddler Express and Riddler Classic. I created a toy ...

Riddler Solutions: Pedestrian Puzzles

March 3, 2020 |

Riddler express Riddler classic Appendix Time to center Transform grid, rotate first Transform city, pretty This post contains solutions to FiveThirtyEight’s two riddles released 2020-02-14, Riddler Express and Riddler Classic. I created a toy ...

animatrixr & Visualizing Matrix Transformations pt. 2

February 23, 2020 |

This post is a continuation on my post from last week on Visualizing Matrix Transformations with gganimate. Both posts are largely inspired by Grant Sanderson’s beautiful video series The Essence of Linear Algebra and wanting to continue messing around with Thomas Lin Peterson’s fantastic gganimate package in R. ... [Read more...]

animatrixr & Visualizing Matrix Transformations pt. 2

February 23, 2020 |

This post is a continuation on my post from last week on Visualizing Matrix Transformations with gganimate. Both posts are largely inspired by Grant Sanderson’s beautiful video series The Essence of Linear Algebra and wanting to continue messing around with Thomas Lin Peterson’s fantastic gganimate package in R. ... [Read more...]
1 2