Articles by David Robinson

Machine learning in a hurry: what I’ve learned from the SLICED ML competition

July 21, 2021 | David Robinson

This summer I’ve been competing in the SLICED machine learning competition, where contestants have two hours to open a new dataset, build a predictive model, and be scored as a Kaggle submission. Contestants are graded primarily on model performance, but also get points for visualization and storytelling, and from ... [Read more...]

The ‘circular random walk’ puzzle: tidy simulation of stochastic processes in R

November 23, 2020 | David Robinson

Previously in this series The “lost boarding pass” puzzle The “deadly board game” puzzle The “knight on an infinite chessboard” puzzle The “largest stock profit or loss” puzzle The “birthday paradox” puzzle The “Spelling Bee honeycomb” pu... [Read more...]

The ‘prisoner coin flipping’ puzzle: tidy simulation in R

May 4, 2020 | David Robinson

Previously in this series The “lost boarding pass” puzzle The “deadly board game” puzzle The “knight on an infinite chessboard” puzzle The “largest stock profit or loss” puzzle The “birthday paradox” puzzle The “Spelling Bee honeycomb” puzzle Feller’s “coin-tossing” puzzle The “spam comments” puzzle I love 538’s Riddler column, ... [Read more...]

The ‘spam comments’ puzzle: tidy simulation of stochastic processes in R

April 13, 2020 | David Robinson

Previously in this series: The “lost boarding pass” puzzle The “deadly board game” puzzle The “knight on an infinite chessboard” puzzle The “largest stock profit or loss” puzzle The “birthday paradox” puzzle The “Spelling Bee honeycomb” puzzle Feller’s “coin-tossing” puzzle I love 538’s Riddler column, and the April 10 puzzle ... [Read more...]

Feller’s coin-tossing puzzle: tidy simulation in R

January 17, 2020 | David Robinson

Previously in this series: The “lost boarding pass” puzzle The “deadly board game” puzzle The “knight on an infinite chessboard” puzzle The “largest stock profit or loss” puzzle The “birthday paradox” puzzle I have an interest in probability puzzles and riddles, and especially in simulating them in R. I recently ... [Read more...]

The ‘Spelling Bee Honeycomb’ puzzle: efficient computation in R

January 6, 2020 | David Robinson

Previously in this series: The “lost boarding pass” puzzle The “deadly board game” puzzle The “knight on an infinite chessboard” puzzle The “largest stock profit or loss” puzzle The “birthday paradox” puzzle I love 538’s Riddler column, and the January 3 puzzle is a fun one. I’ll quote: The New ...

The birthday paradox puzzle: tidy simulation in R

January 3, 2020 | David Robinson

Previously in this series: The “lost boarding pass” puzzle The “deadly board game” puzzle The “knight on an infinite chessboard” puzzle The “largest stock profit or loss” puzzle The birthday problem is a classic probability ... [Read more...]

The ‘largest stock profit or loss’ puzzle: efficient computation in R

December 24, 2019 | David Robinson

Previously in this series: The “knight on an infinite chessboard” puzzle The “lost boarding pass” puzzle The “deadly board game” puzzle I recently came across an interview problem from A Cool SQL Problem: Avoiding For-Loops . Avoiding loops is a topic I always enjoy reading about, and the blog post didn’... [Read more...]

The ‘knight on an infinite chessboard’ puzzle: efficient simulation in R

December 10, 2018 | David Robinson

Previously in this series: The “lost boarding pass” puzzle The “deadly board game” puzzle I’ve recently been enjoying The Riddler: Fantastic Puzzles from FiveThirtyEight, a wonderful book from 538’s Oliver Roeder. Many of the probab... [Read more...]

Exploring college major and income: a live data analysis in R

October 16, 2018 | David Robinson

I recently came up with the idea for a series of screencasts: I've thought about recording a screencast of an example data analysis in #rstats. I'd do it on a dataset I'm unfamiliar with so that I can show and narrate my live thought process.Any suggestions for interesting datasets ... [Read more...]

Who wrote the anti-Trump New York Times op-ed? Using tidytext to find document similarity

September 6, 2018 | David Robinson

Like a lot of people, I was intrigued by “I Am Part of the Resistance Inside the Trump Administration”, an anonymous New York Times op-ed written by a “senior official in the Trump administration”. And like many data scientists, I was curious about what role text mining could play. Ok ... [Read more...]

Scientific debt

May 10, 2018 | David Robinson

A very useful concept in software engineering is technical debt. Technical debt occurs when engineers choose a quick but suboptimal solution to a problem, or don’t spend time to build sustainable infrastructure. Maybe they’re using an approach that doesn’t scale well as the team and codebase expand (...

Data science at DataCamp

April 10, 2018 | David Robinson

In January, I was excited to make an announcement about a shift in my career: I have some exciting news: today I'm joining @DataCamp as their Chief Data Scientist 🎉📊📈 pic.twitter.com/wiN9J4qSjx— David Robinson (@drob) January 29, 2018 When I first discussed the role with the DataCamp CEO, I ...

What digits should you bet on in Super Bowl squares?

February 4, 2018 | David Robinson

My new office introduced me to a betting game I wasn’t previously familiar with: Super Bowl squares. It’s played with a ten-by-ten grid, like this one from printyourbrackets.com: Each row and column gets an assortment of digits from 0-9 represen...

Exploring handwritten digit classification: a tidy analysis of the MNIST dataset

January 22, 2018 | David Robinson

In a recent post, I offered a definition of the distinction between data science and machine learning: that data science is focused on extracting insights, while machine learning is interested in making predictions. I also noted that the two fields greatly overlap: I use both machine learning and data science ... [Read more...]

What’s the difference between data science, machine learning, and artificial intelligence?

January 9, 2018 | David Robinson

When I introduce myself as a data scientist, I often get questions like “What’s the difference between that and machine learning?” or “Does that mean you work on artificial intelligence?” I’ve responded enough times that my answer easily qualifies for my “rule of three”: When you’ve written ...

Advice to aspiring data scientists: start a blog

November 14, 2017 | David Robinson

Last week I shared a thought on Twitter: When you’ve written the same code 3 times, write a functionWhen you’ve given the same in-person advice 3 times, write a blog post— David Robinson (@drob) November 9, 2017 Ironically, this tweet hints at a piece of advice I’ve given at least 3 dozen ... [Read more...]

Announcing “Introduction to the Tidyverse”, my new DataCamp course

November 9, 2017 | David Robinson

For the last few years I’ve been encouraging a particular approach to R education, particularly teaching the dplyr and ggplot2 packages first and introducing real datasets early on. This week I’m excited to announce the next step: the release of Introduction to the Tidyverse, my new interactive course ...

Don’t teach students the hard way first

September 21, 2017 | David Robinson

Imagine you were going to a party in an unfamiliar area, and asked the host for directions to their house. It takes you thirty minutes to get there, on a path that takes you on a long winding road with slow traffic. As the party ends, the host tells you “... [Read more...]

Trump’s Android and iPhone tweets, one year later

August 9, 2017 | David Robinson

A year ago today, I wrote up a blog post Text analysis of Trump’s tweets confirms he writes only the (angrier) Android half. My analysis, shown below, concludes that the Android and iPhone tweets are clearly from different people, posting during different times of day and using hashtags, links, ... [Read more...]

1 2 3 4 »

Copyright © 2025 | MH Corporate basic by MH Themes