A Newbie’s Guide to Making A Pull Request (for an R package)

January 19, 2019
By

(This article was first published on r on Tony ElHabr, and kindly contributed to R-bloggers)

I had the wonderful opportunity to participate in the
{tidyverse} Developer Day
the day after rstudio::conf2019
officially wrapped up. 1
One of the objectives of the event
was to encourage
open-source contributor newbies (like me 😄) to
gain some experience, namely through submitting
pull requests to address issues with {tidyverse} packages.

Having only ever worked with my own packages/repos before,
I found this was to be perfect opportunity to “get my feet wet”!

A “Minimal”, Step-by-Step Breakdown of the Process

To the reader who is interested in nothing else other than the
“minimum” set of “required” steps of making a pull request 2,
I’ve outlined the steps that I took below (with additional commentary
provided afterwards and/or with footnotes). I use double brackets {{/}}
to indicate names/variables that should be filled in by you for your particular
use case. More specifically,

  • {{repo}} refers to the GitHub repository (duh 😛);
  • {{user}} refers to your GitHub username;
  • {{owner}} refers to the owner of the GitHub repo;
  • {{branch-name}} refers to the name that you give your branch for your pull request (double duh 😛 😛); 3 4 and
  • {{message}} refers to the commit message that you make with your pull request changes (duh x3 😛 😛 😛).
  1. Fork the repo on GitHub.

    At https://github.com/{{owner}}/{{repo}}.git (i.e. the repo’s GitHub page),
    select the “Fork” button. Wait for GitHub to generate the fork and for your browser
    to navigate you to https://github.com/{{user}}/{{repo}}.git
    (i.e. your copy of the repo on GitHub). 5

  2. Prepare your local environment to make the pull request.

    Using the Git Bash terminal,

    • git clone https://github.com/{{user}}/{{repo}}.git
    • cd {{repo}}
    • git remote add upstream https://github.com/{{owner}}/{{repo}}.git 6
    • git pull upstream master --ff-only 7
    • git branch {{branch-name}} 8
    • git checkout {{branch-name}}
  3. Make the changes (locally) for the pull request.

  4. Review the changes (locally) and push.

    Using the Git Bash terminal,

    • git status
    • git add .
    • git commit -m "{{message}}"
    • git push -u origin {{branch-name}} 9
  5. Finalize the pull request on GitHub.

    At https://github.com/{{user}}/{{repo}}.git, select the “New pull request” button,
    then “Create pull request” on the subsequent page, and then “Create pull request”
    again (on the page that allows you to add comments). 10

There are a couple of extra/alternative things you should consider doing
in this process (although these are completely optional).

  • After cd‘ing into the repo, a good “sanity check” to perform is a
    git remote -v to verify that you are on (master) branch of the forked repo.
    You should see the following when running this command.
origin https://github.com/{{user}}/{{repo}}.git (fetch)
origin https://github.com/{{user}}/{{repo}}.git (push)

Then, you could run this command again after the
git remote add upstream https://github.com/{{owner}}/{{repo}}.git command
and verify that you see the following.

origin https://github.com/{{user}}/{{repo}}.git (fetch)
origin https://github.com/{{user}}/{{repo}}.git (push)
upstream https://github.com/{{owner}}/{{repo}}.git (fetch)
upstream https://github.com/{{owner}}/{{repo}}.git (push)
  • Regarding the git pull upstream --ff-only step, it’s actually
    “safer” to break this step up into two: (1) git fetch upstream, followed by
    (2) git merge upstream/master.
    Nonetheless, (I believe that) the git pull command shown here should
    achieve the same result.

  • Following the git pull upstream --ff-only step, you might also want
    to do a git push to really ensure that your master branch is “in
    sync” with the original repo. Nonetheless, this is completely optional,
    especially if you’re working exclusively on a branch.
    This is discussed by Jenny Bryan in the
    “Get upstream changes for a fork” chapter
    in her Happy Git and GitHub for the useR book (which, I should note,
    I used as a guide throughout this process).

  • In the last “Create pull request” page, where you have the option to write a comment,
    it’s useful to write something like “Fixes #{{#}}”, where {{#}} is the number
    of the issue. GitHub has a nice feature that can link the pull request to the
    issue that it attempts to fix (assuming that the pull request is directed towards
    a specific issue).

My Experience

The RStudio staff sort of gave us a head-start by tagging some issues
beforehand. (Many of these
were supposed to be relatively straightforward to resolve.)
Trying to avoid something too difficult and not wanting to be stuck
in “indecision purgatory”, I quickly jumped on one that seemed simple
enough—{dplyr}’s issue 4071: “Add examples for cumall().
Note that I say that it “seemed simple enough”; in reality, it would go on to
take a little bit more effort than I had originally imagined it would
(which seems to be true of anything in software development 😬).

First, it took me a minute or so to find out where I would need to add the documentation
examples. Normally, assuming one is following the
“typical” {devtools}/{Roxygen2} workflow for package documentation,
one would add comments in the same file as the functions
that are being documented. In this case, however, the documentation
for cumall() (and also that for the
the cumany() and cummean() functions, which were also without examples), is
generated programmatically in {dplyr}’s
RcppExports.R file from
a cpp (i.e. C++) file—window.cpp.

Next was the actual “heart” of the process—creating the example(s). As a strong
advocate of the “copy-and-paste-from-StackOverflow” workflow 😄 (as well
as the “changing-things-and-seeing-what-works” workflow), I searched some
of the existing examples and found that the “ranking” documentation group
(in rank.R) provided a good
starting point. I didn’t take too long to come up with some basic
examples. (See the first commit on my pull request,
or, for all of the
details, the whole history of the request.)
Then, after running a devtools::check() to regenerate the documentation,
rebuild the package (and verify that I didn’t “break” anything), etc., I submitted
the pull request on GitHub.

I assumed (perhaps naively) that my pull request would be accepted without
question. However, that was without considering the fact
that it was going to be reviewed by none other than
Hadley Wickham himself.
Hadley is certainly a meticulous person, so the changes would
not get approved without some level of well-intentioned, critical examination.
And, as it turns out, the request was not approved immediately.

Hadley identified some points of improvement and submitted some comments,
obligating me to address his feedback.
Although failure to gain immediate acceptance for the pull request
might be discouraging, this was certainly not the case for me (and I hope
that it is not for anyone else, especially first-timers). My mindset was to treat
the feedback as an opportunity for improvement, which I’m always up for.
Moreover, Hadley’s supportive statements and tone
provided additional
motivation for me to continue working toward something better. So, even
though I’m mostly writing from my own point of view as a first-time pull request
submitter, there is a lesson to be learned here for those
receiving and reviewing pull requests from other—be supportive! 11

After submitting some updates in a second commit to my branch,
I received another round of feedback from Hadley.
Again, his tone was supportive,
so I was more than happy to address his comments in a third commit shortly afterwards.

As it stands today–a day after I made the initial fork and pull
request—my pull request
has an “Open” status. Nonetheless, I’m optimistic that it will
eventually be merged (perhaps with another round or two of feedback and commits).

Parting Thoughts

To sum up my whole {tidyverse} Developer Day experience,
I’m really glad to have gone through the pull request process
in a setting with a bunch of experienced people who were ready and willing to help,
as well as others who were just as new to the whole thing as me! It really
made everything less intimidating. And now, after having gained the experience,
I won’t hesitate to continue contributing to open-source projects going forward.

Also, I’m really happy for the other first-time pull request
submitters who were successful in their efforts!

Additional Resources

Finally, as a “disclaimer” of sorts,
I should say that I’m certainly not the first person
to write about
how to make a pull request (and I’m by no means an expert on the matter).
For those looking for good reference material authored by people in the R
community, I’d recommend the following resources.



  1. Check out this public Google photos album to see a ton of awesome pictures taken byJ.D. Long.
    ^
  2. I know I can’t be the only one who loves reading other people’s blogs, but (admittedly) tends to skim through them to find parts that are most relevant to what I’m doing at the time.
    ^
  3. Note that these steps assumes that the project is on GitHub.
    ^
  4. You can technically make your pull request on the master branch of your forked repo (and avoid using branches whatsoever), but it’s not a “best practice”.
    ^
  5. I apologize for the lack of screenshot here. (I didn’t want to obfuscate the text.) If you’re a visual kind of person, then a YouTube video such as this one might be helpful.
    ^
  6. Even though I’ve read through most of this book before, I only found this part of the book after I went through the pull request process, and I wish I had seen it earlier!
    ^
  7. This step is necessary in order to ensure that your fork is truly “in sync” with the original rep, which could certainly be the case if you forked the repo a while ago and there have been updates to the original repo since then.
    ^
  8. The --ff-only part of the command is actually optional here, but is highly recommended by Jenny Bryan
    ^
  9. Note that the -u flag mean “upstream”, but this is NOT the same as the upstream remote created earlier. u IS necessary here, but naming the remote upstream in a previous step was NOT.
    ^
  10. There may be some additional steps here if your request is “behind” the master branch of the source repository. Since each kind of case like this is unique, and because I’ve tried to describe the process as succinctly as possible, I don’t discuss this situation.
    ^
  11. This reminds me of one of my favorite blog posts—Yihui Xie’s rant on the over-eagerness of the Stack Overflow community to close “bad” questions.
    ^
  12. Note that {{branch-name}} can be anything, but something like issue-{{#}} is a good practice.
    ^

To leave a comment for the author, please follow the link and comment on their blog: r on Tony ElHabr.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)