Blog Archives

Scientific debt

May 10, 2018
By
Scientific debt

A very useful concept in software engineering is technical debt. Technical debt occurs when engineers choose a quick but suboptimal solution to a problem, or don’t spend time to build sustainable infrastructure. Maybe they’re using an approach that doesn’t scale well as the team and codebase expand (such as hardcoding “magic numbers”), or using a tool for reasons of convenience...

Read more »

Data science at DataCamp

April 10, 2018
By
Data science at DataCamp

In January, I was excited to make an announcement about a shift in my career: I have some exciting news: today I'm joining @DataCamp as their Chief Data Scientist 🎉📊📈 pic.twitter.com/wiN9J4qSjx— David Robinson (@drob) January 29, 2018 When I first discussed the role with the DataCamp CEO, I described my goal as to “Make DataCamp as good at doing data science...

Read more »

What digits should you bet on in Super Bowl squares?

February 4, 2018
By
What digits should you bet on in Super Bowl squares?

My new office introduced me to a betting game I wasn’t previously familiar with: Super Bowl squares. It’s played with a ten-by-ten grid, like this one from printyourbrackets.com: Each row and column gets an assortment of digits from 0-9 represen...

Read more »

Exploring handwritten digit classification: a tidy analysis of the MNIST dataset

January 22, 2018
By
Exploring handwritten digit classification: a tidy analysis of the MNIST dataset

In a recent post, I offered a definition of the distinction between data science and machine learning: that data science is focused on extracting insights, while machine learning is interested in making predictions. I also noted that the two fields greatly overlap: I use both machine learning and data science in my work: I might fit a model...

Read more »

What’s the difference between data science, machine learning, and artificial intelligence?

January 9, 2018
By
What’s the difference between data science, machine learning, and artificial intelligence?

When I introduce myself as a data scientist, I often get questions like “What’s the difference between that and machine learning?” or “Does that mean you work on artificial intelligence?” I’ve responded enough times that my answer easily qualifies for my “rule of three”: When you’ve written the same code 3 times, write a functionWhen you’ve given the same in-person...

Read more »

Advice to aspiring data scientists: start a blog

November 14, 2017
By

Last week I shared a thought on Twitter: When you’ve written the same code 3 times, write a functionWhen you’ve given the same in-person advice 3 times, write a blog post— David Robinson (@drob) November 9, 2017 Ironically, this tweet hints at a piece of advice I’ve given at least 3 dozen times, but haven’t yet written a post about. I’ve...

Read more »

Announcing “Introduction to the Tidyverse”, my new DataCamp course

November 9, 2017
By
Announcing “Introduction to the Tidyverse”, my new DataCamp course

For the last few years I’ve been encouraging a particular approach to R education, particularly teaching the dplyr and ggplot2 packages first and introducing real datasets early on. This week I’m excited to announce the next step: the release of Introduction to the Tidyverse, my new interactive course on the DataCamp platform. The course is an introduction to the dplyr...

Read more »

Don’t teach students the hard way first

September 21, 2017
By

Imagine you were going to a party in an unfamiliar area, and asked the host for directions to their house. It takes you thirty minutes to get there, on a path that takes you on a long winding road with slow traffic. As the party ends, the host tells you “You can take the highway on your way back,...

Read more »

Trump’s Android and iPhone tweets, one year later

August 9, 2017
By
Trump’s Android and iPhone tweets, one year later

A year ago today, I wrote up a blog post Text analysis of Trump’s tweets confirms he writes only the (angrier) Android half. My analysis, shown below, concludes that the Android and iPhone tweets are clearly from different people, posting during different times of day and using hashtags, links, and retweets in distinct ways. What’s more, we can...

Read more »

Teach the tidyverse to beginners

July 5, 2017
By
Teach the tidyverse to beginners

A few years ago, I wrote a post Don’t teach built-in plotting to beginners (teach ggplot2). I argued that ggplot2 was not an advanced approach meant for experts, but rather a suitable introduction to data visualization. Many teachers suggest I’m overestimating their students: “No, see, my students are beginners…”. If I push the point, they might insist I’m...

Read more »

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)