This was my second RStudio Conference following last year’s edition
in San Diego! In addition, at Tidyverse Developer Day I got a really
cool chance to work on issues and contribute to making the Tidyverse
better. This post won’t be a complete overview of the talks at the
conference (others have already released some good blog posts on that
note: Julia Silge,
etc.) and will be more of a reflection on how I contributed to the
Tidyverse at #TidyverseDevDay and how I felt being at the conference.
As usual I collected a bunch of hex stickers at this conference, many of
them that I already own… I seem to have a weird thing about collecting
them but never using them (from Twitter I can see this isn’t something
that’s exclusive to me however). Talking about hex stickers…for
Tidyverse Developer Day each participant got a shiny Tidyverse hex
|Front side||Back side|
Too bad it’s not like the Dumbledore’s Army galleon, or Hadley could
just send a covert message to all the participants, like “TidyDevs,
Assemble!” Maybe next year, I suppose.
Anyways, let’s get started!
Tidyverse Developers’ Day
The day following RStudio::Conf those lucky people that got a ticket
gathered at the “Sunset Room” to contribute to the Tidyverse. After
grabbing a quick coffee and breakfast taco, Hadley made a small
introduction outlined exactly what and how the day was going to go and
then we all got to work. There was a large list of tagged issues ready
for us but we could also choose our own and ask a RStudio member to tag
it as “tidy-dev-day” for us.
After finding an issue I wanted to work on here was my basic workflow:
- Fork the repo of the package I needed to work on.
- Go into RStudio: File > New Project > Version Control >
Git > Paste the
.gitlink from your forked repository on GitHub
(click on the big green “Clone or download” button)
- Once you’ve opened up the project, make sure to create a new branch
through the Git tab (click on the icon with two purple boxes next to
the gear icon)
- You’re all set to start coding!
There is another way to do most of this through the Git Bash terminal,
which you can learn from Tony’s blog post
The main things I focused on were improving documentation and providing
additional examples. For these tasks I found it important to do a lot of
research first. Thankfully I was able to find many Stack Overflow posts
of people explaining the issues that I wanted to write about as well as
#rstats blog posts/tutorials that could provide me with ideas on how to
phrase things and write good small examples!
An important thing that I learned was that it’s good practice to create
a different branch for working on a Pull Request for different issues on
the same package! When you’re changing documentation in a package it’s
important to make sure you use
function to update changes. Don’t forget to run the R CMD Check as well
(the “Check” icon in the
Build tab). After you’re done with all of
that, it’s time to commit, push, then create a pull request (PR) to
merge your proposed changes with the master branch!
When you write a commit message you can use a hashtag and then number to
refer to issues in the Github repo as well as use a number of
to close these issues automatically (in our case when the PR is merged).
If you check Github you can see that it automatically prepended the repo
name and a link to the issue being referenced.
Then I waited to see if those changes were approved or if there were
still a few things that needed changing:
OK! After reading the comments from Hadley (!) and Lionel (!) I go back
into my branch in RStudio and fix those changes. When I commit and push
to my forked repo again, it is automatically tracked in the PR. I
usually make the comment, “edits to comply with PR review” when pushing
There we go, I have now officially contributed to the Tidyverse!
Another good resource for contributing to open-source is Nic
Crane’s step-by-step blog
(she also presented at RStudio::Conf on building a Shiny app for genomic
medicine) and Tony El-Habr’s blog
on #TidyverseDevDay (whom I actually met at the Sports Analytics
I felt this was a fantastic opportunity for people of all skill levels
to experience contributing to open source. The RStudio team were very
helpful buzzing around the event space and for those extremely new to
programming or git it was a valuable lesson as you were guided along the
process from start to finish. For myself, I have had previous experience
contributing to open source packages as well as creating, testing,
bug-squishing R packages at my workplace but this was a great way for me
to give back to the R community in what little way that I could. I
actually still have a few more issues from Tidyverse Developer Day
that are a work-in-progress and I hope to continue contributing in the
To get to Austin I had a long flight in from Japan with a 5 hour layover
in Minneapolis. Bored, I decided to do some
#TidyTuesday to pass
the time. It turned out
the end but jet lag does not make for very interpretable code… While I’m
still on the topic of #TidyTuesday… apparently, Thomas
Mock had some TidyTuesday hex stickers
but unfortunately I couldn’t get my hands on them!
Here were some of my highlights from Day One:
An awesome #DataForGood type of presentation by Brooke
Watson who talked about using R
to tidy data on families separated at the US border.
Tyler Morgan-Wall on
rayshader: I’ve been casually keeping up with developments on
twitter but I was still wowed by the presentation, especially 3D
printing. If I had that kind of tech when I was a kid I would’ve won
ALL the science fairs with the most realistic looking baking soda
Thomas Pedersen came out with
gganimatepresentation showing all the new features
introduced since his last
gganimatetalk at UseR 2018. This is
definitely a talk that you need to watch for all the examples!
All in all Day One was great but I was still pretty exhausted from
my long trip so I didn’t get to talk to as many people as I liked.
Day Two began with a great talk on teaching programming by
Felienne, her talk was so good I
realized she didn’t say anything about R until after she finished! My
biggest take-away from her was “You don’t become an expert by doing
expert things!” which I agreed with as a self-taught R user. For me it
was really about starting with the basics, integrating what I already
knew outside of R into what I did with R (ex. bringing my love of soccer
into creating World Cup
gganimate), and incrementally building up my skills through
reading blog posts and tutorials.
One of the most informative talks from my perspective was by Jim
Hester on dependencies. He talked about
how “not all dependencies are created equal” due to differences between
dependencies in install times, package sizes, and the system
requirements. He also talked about the “illusionary superiority” problem
every package developer gets in regards to overestimating their own
abilities and underestimating the probability of introducing new bugs
from adding dependencies. To address these concerns Jim introduced the
itdepends package which acts as a toolbox for dependency
decision-making. This package allows one to assess usage, measure
weights, visualize proportions, and assist in the removal of
dependencies through a series of
dep_*() functions. As I help develop
and maintain all the R packages that my NGO
uses for data processing/visualization, this talk and package will be
extremely useful for me to do some code “auditing” and find ways to
reduce technical debt.
Several other highlights from Day Two were:
Jesse Sadler talked about
tackling problems dealing with accounting/inheritance data from 16th
Century Europe using R. Along the way he created the
package to help himself analyze non-decimal currencies!
On Day Two I mustered up the courage and energy to go to two
different Birds-of-a-Feather sessions, Public Sector/Government and
Sports Analytics. At lunch I was able to meet R users from places like
the Federal Reserve and the Federal Aviation Administration. I heard
stories on how hard it was to convince people, especially non-technical
higher-ups, to give them the green light to switch to R as well as more
recent success stories of running workshops and tutorials within their
departments. Even though I work for a NGO I felt comfortable talking to
these people and it was a great way to exchange knowledge with people in
a somewhat similar industry (especially since I was unable to attend the
“Data for Good” Birds-of-a-Feather session). The shadow that hung over a
lot of the people I met was that they were unable to work due to the
government shutdown, I can only hope that the conference provided some
good cheer and that they can get back to work soon.
In the afternoon break was the Sports Analytics Birds-of-a-Feather
session in the main conference lounge area. While I was there I finally
got to meet Mara in-person for the
first time and I had an enjoyable time talking with her and the
surrounding group of baseball and hockey team analysts on the latest
trends and topics like fantasy sports and analytics in the betting
industry. Overall these Birds-of-a-Feather groups were a great way to
mingle with people in industries you’re interested in but I thought it
was a shame how some were longer/shorter depending on which slot the
event happened in. Understandably it is quite hard to schedule so many
different groups equally, but maybe a dedicated “industry” session block
could be worked in next year?
To wrap the conference up David Robinson
gave a great keynote on spending time on contributing to open-source,
“public work”. Whether through answering questions on SO or on Twitter,
writing up a blog post, to giving a talk at a conference/meetup, David
talked about the many ways to contribute to the knowledge pool in not
just R and data science, but also for your respective research domain as
well. His words really resonated with me as he was the one back about a
year-and-a-half ago that gave me confidence to start my own blog and
share my stuff with the #rstats community. Since then I got a job doing
R stuff and even gave a
talk at the
TokyoR meetup last summer! One of my
goals for this year is to try to do a talk in Japanese while a long-term
goal is to present at one of the big R conferences.
Throughout the conference I managed about four-five hours of sleep on
average, which seemed to have been a thing for other people as well:
For me it was mostly jet lag but also I was kept up by looking up all
the cool stuff I learned and how I could apply it at work and for my own
personal projects… well, and looking up taco places to eat at on the
next day too!
This conference was the one I talked to people the most up until now as
I’ve slowly gained confidence in working in R and being a member of this
community. I was even recognized by some people for my soccer-related
blog posts, which is a first! I almost feel stupid for being rather
timid in the past and I want to try and be more outgoing in future
conferences (possibly UseR in Toulouse
For next year I already grabbed a SuperFan ticket so I hope to see some
old faces and new faces next year in San Francisco. It’s going to be
nice to go back to the Bay!