PCA and the #TidyTuesday best hip hop songs ever

April 13, 2020

Lately I’ve been publishing screencasts demonstrating how to use the tidymodels framework, from first steps in modeling to how to tune more complex models. Today, I’m exploring a different part of the tidymodels framework; I’m showing how to implement principal component analysis via recipes with this week’...
#TidyTuesday hotel bookings and recipes

February 10, 2020

Last week I published my first screencast showing how to use the tidymodels framework for machine learning and modeling in R. Today, I’m using this week’s #TidyTuesday dataset on hotel bookings to show how to use one of the tidymodels packages recipes with some simple models! Here is ...
#TidyTuesday and tidymodels

February 4, 2020

This week I started my new job as a software engineer at RStudio, working with Max Kuhn and other folks on tidymodels. I am really excited about tidymodels because my own experience as a practicing data scientist has shown me some of the areas for growth that still exist in ...
Practice using lubridate… THEATRICALLY

August 25, 2019

I am so pleased to now be an RStudio-certified tidyverse trainer! ???? I have been teaching technical content for decades, whether in a university classroom, developing online courses, or leading workshops, but I still found this program valuable for my own professonal development. I learned a lot that is going to ...
Introducing tidylo

July 7, 2019

Today I am so pleased to introduce a new package for calculating weighted log odds ratios, tidylo. Often in data analysis, we want to measure how the usage or frequency of some feature, such as words, differs across some group or set, such as documents. One statistic often used to ...
Reordering and facetting for ggplot2

June 30, 2019

I recently wrote about the release of tidytext 0.2.1, and one of the most useful new features in this release is a couple of helper functions for making plots with ggplot2. These helper functions address a class of challenges that often arises when dealing with text data, so we’ve included ...
Relaunching the qualtRics package

April 29, 2019

Note: cross-posted with the rOpenSci blog. rOpenSci is one of the first organizations in the R community I ever interacted with, when I participated in the 2016 rOpenSci unconf. I have since reviewed several rOpenSci packages and been so happy to be co... [Read more...]

Writing a letter to DataCamp

April 15, 2019

Since 2017 I have been an instructor for DataCamp, the VC-backed online data science education platform. What this means is that I am not an employee, but I have developed content for the company as a contractor. I have two courses there, one on text mining and one on practical supervised ... [Read more...]

Text classification with tidy data principles

December 23, 2018

I am an enthusiastic proponent of using tidy data principles for dealing with text data. This kind of approach offers a fluent and flexible option not just for exploratory data analysis, but also for machine learning for text, including both unsupervised machine learning and supervised machine learning. I haven’t ...
