Cleaning up forked GitHub repositories with {gh}

[This article was first published on The Jumping Rivers Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

One great thing about using GitHub is the ability to view and contribute to others’ code. Even the code underlying many of our favourite packages is available for us to examine and play around with.

Forking a repository is a great way to create an exact replica of someone else’s project in our own user space. We can then freely make changes to this copy without affecting the original project. If you end up especially proud of your changes, you can then submit a Pull Request to offer them up to the owner of the original repository. However, your fork doesn’t have to end up in a contribution – you can also just keep experimenting with the code forever or use it as a starting point for your own project.

A forking mess

If you are an avid forker of GitHub repos, your original repositories on GitHub may quickly become crammed in between an endless stream of forked repos. Your user space has become very cluttered, with old forks that you haven’t looked at in years still taking up space. Well, now it’s time for some spring cleaning and the first task is de-cluttering your repositories by removing forks.


Do you use RStudio Pro? If so, checkout out our managed RStudio services


Manually cleaning

You can manually delete repositories using the GitHub interface. Go to the repository you wish to delete, then select Settings at the top of the page

The Settings button on your repository

Then scroll to the bottom of the page and enter the Danger Zone marked by a red box.

Deleting a repo from the Danger Zone

From there, you can select Delete this repository which will prompt you to confirm that you are absolutely sure of what you’re doing by typing out the name of the repository. Note that after deleting the repository, the action cannot be undone. Also note that if you are deleting a forked repository, deleting it will only remove it (including any changes you have made to it) from your own GitHub – you won’t accidentally delete the original project (phew).

So it is possible to clean up your GitHub manually and this might be the most suitable way if you’re only wanting to delete 1-2 repositories. But let’s say you’ve forked over fifty repositories. Manually going into each one, finding the delete button in the settings and typing in the confirmation prompt is not what you want to spend your day doing. As with all manual methods, pointing and clicking does not scale particularly well.

Using the {gh} package

The {gh} provides an R-user-friendly wrapper around the GitHub API. It lets you interact with GitHub to e.g. create new repositories or delete old ones directly from RStudio. The package is on CRAN and is installed in the usual way

install.packages("gh")

To use the package, you first need to generate a Personal Access Token (PAT).

Getting a token

Creating a personal access token to be able to use the GitHub API is easy. You can either navigate to the page on GitHub (Settings > Developer Settings > Personal Access tokens > Generate new token), or you can use the handy create_github_token() function from {usethis} which will open the same page in your browser.

usethis::create_github_token()

Adding a new Personal access token.

From there, you give your token a useful name as well as select what access should be granted by the token. Note: if you want to use {gh} to delete unwanted forked repositories, you will need to select the delete_repo scope.

Setting the delete_repo scope for your token.

However, be aware that this allows you to delete any repo – not just forked ones. After deciding on the scopes, you generate your token. As the page tells you, you will have to store your token somewhere as you won’t be able to access it again after closing the page. We recommend copying it and storing it in a password manager such as LastPass. Once you have saved your token somewhere secure, you can make it available to your R environment using the set_github_pat() function from the {credentials} package which will prompt you to enter your PAT, which you did save somewhere… right? If you did not follow our advice and now no longer have access to your PAT, don’t worry, you can delete the old one on GitHub and generate a new one.

Viewing existing tokens

OK, now that you’ve definitely got your token ready, you can run the code below

credentials::set_github_pat()

which will prompt you to enter your PAT. Now you can finally get to the cleaning!

Cleaning

We will load the {gh} package, as well as the {magrittr} package to get access to pipes.

library("gh")
library("magrittr")

Step one is to retrieve your repositories

my_username = "your_username_goes_here"
my_repos = gh("GET /users/:owner/repos", 
              owner = my_username, 
              page = 1, 
              per_page = 100)

The GitHub API is paginated. This means it returns results in pages, with at most 100 results per page. If

length(my_repos)

is less than 100, then you don’t need to worry. If you have more than 100 repositories, you can either choose a page or loop through all pages.

The object my_repos is now a list of repositories. Each element of the list is a particular repository. We are interested in two particular elements: name and fork:

my_repos[[1]]$name
my_repos[[1]]$fork

These elements tell us the name of the repository and whether it was created as a result of forking. Now we just repeat this process for all of our repositories and filter to return only the repositories which are forked.

forked_repos = purrr::map_dfr(my_repos, ~unlist(.x[c("name", "fork")])) %>%
  dplyr::filter(fork == "TRUE") # Here "TRUE" is a character, not a logical

The next step involves manually, and very carefully selecting the repositories you want to delete. If you want to delete all forked repositories (!), simply set

# You probably don't want to do this!
to_delete = forked_repos$name

Otherwise, create a vector of repositories to delete

to_delete = c("bob-does-tidytuesday", 
              "melindas-cool-project", 
              "a-random-r-package")

Finally we delete using

purrr::map(to_delete, 
           ~gh("DELETE /repos/:owner/:repo", owner = my_username, repo = .x))

And… they’re gone!

Deleting forked repositories like this is an effective way to clean out your GitHub of repositories that you haven’t looked at or touched in a while. However, unlike doing it manually, there is no confirmation where you have to type out a specific repository’s name to confirm that you actually are deleting what you want to be deleting. So, be extremely careful when deleting repositories using {gh} as you don’t want to lose hours of work by accidentally running the wrong line.


For updates and revisions to this article, see the original post

To leave a comment for the author, please follow the link and comment on their blog: The Jumping Rivers Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)