Work smarter; not harder: COVID-19 processing for the WHO/Europe

Posted on February 16, 2023 by The Jumping Rivers Blog in R bloggers | 0 Comments

[This article was first published on The Jumping Rivers Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Last night, I filled a washing machine with laundry and scheduled it to finish in the morning. And do you know what I had to do next? Nothing. I simply went to bed. In stark contrast to 100 years ago, I didn’t need to fill a bucket with water, I didn’t spend an hour rubbing clothes against a washboard to agitate away the dirt, and I didn’t need to worry about whether the prolonged contact between a cleaning detergent and my hands was damaging to the skin. Instead, a machine followed its pre-programmed routine, and I slept like a log. And what could possibly be better than an extra hour in bed?

That’s just one of many examples of the small automated processes that appear throughout our lives.

But they all have a common purpose: to make our lives easier.

If you’re a regular on our blog, you may have already read about how we streamlined the data processing on an application we’re maintaining for the World Health Organisation Europe (WHO/Europe). Those steps improved the experience for users of their WHO/Europe COVID-19 Vaccine Programme Monitor, by slashing loading times and improving responsiveness.

But today, I want to tell you about how automation improved the experience for those working behind the scenes of the application. Tasks were completed automatically, taking away opportunities for human error to sneak in to our processes. Work was autonomously performed each day, providing early warnings about issues with the latest data. Software was frequently tested on a clean environment, verifying that our work could be reproduced on other systems.

Ultimately, developers and maintainers from both Jumping Rivers and the WHO/Europe spent less time on the trivial and repetitive tasks, and more time making improvements where it really mattered. And by sprinkling a little automation in your work, you might just enhance your productivity too.

Do you require help building a Shiny app? Would you like someone to take over the maintenance burden? If so, check out our Shiny and Dash services.

Where can we delegate the tasks to?

The aim of these automated workflows is to take some of the menial tasks that are frequently performed, and complete them automatically using a continuous integration and continuous delivery (CI/CD) pipeline. Many options for performing CI/CD pipelines exist already—such as Jenkins, GitLab CI/CD, Bitbucket pipelines, CircleCI to name just a few— but in the case of the WHO/Europe COVID-19 Vaccine Programme Monitor, we utilized GitHub Actions.

In a typical CI/CD pipeline, we are allocated a blank machine, onto which we can install all the software dependencies we need and to run the tasks, before cleaning itself back out of existence. Now it may sound wasteful to be installing everything from scratch every time a pipeline runs, but there are serious benefits here: starting from scratch is the ultimate check of whether our code is portable and can be run by anyone from any machine. And with a few tricks here and a bit of caching there, set up times for CI/CD pipelines can actually be very reasonable.

The basic concept of an automated workflow

For GitHub Actions, we specify a few things in a YAML file,

When the workflow should run.
What operating system our virtual machine should use.
What environment variables should be defined.
What tasks should be performed.

What do we automate?

Tests

There are a number of processes that we automate, but we’ll start with the one that most developers will want to automate: Testing. It’s a good idea to have tests run when changes are made to the code. After all, if the new code has a mistake, it’s good for your tests to find the error before you go on to build even more code on top of it. So everytime changes are pushed to a pull request or the main branch of our git repository, a workflow runs to perform all tests.

Deployments

The WHO/Europe COVID-19 Vaccine Programme Monitor is hosted on shinyapps.io. Originally, when changes were made to the application, someone would have to manually perform the process of publishing the latest version of the application online. Not only is this needlessly inefficient to have a developer wasting time performing this operation, but it also allows for human-error to enter the situation—what if you’re logged into the wrong account, or you overwrite the wrong application, or perhaps you just patched a critical bug in your code repository but forget to publish the fixed app altogether? In this scenario, it’s better to have a pipeline watching over us, ready to step in at the right moment.

A nice feature of shinyapps.io is that multiple apps can be hosted from a single account. We took advantage of this by creating automated workflows that deploy the latest versions of the apps to shinyapps.io everytime changes were pushed to the default branch, giving users the newest version of the app at all times.

But to make life easier for ourselves, we also publish versions of the app for every proposed change that we create. Not only does this ensure the app should deploy correctly, but it provides a working version of the application that members of the WHO can view, allowing them to request changes or provide approval before all changes are confirmed. When those changes are incorporated into the main versions of the app, our automated workflows delete these development apps and publish the public version.

Data processing

Our previous blog post on the data processing mentioned how a GitHub Actions workflow now handles data processing outside of the app on a daily schedule. We don’t actually need to push code to GitHub to prompt that a workflow should run; a workflow can be scheduled to start at particular times or at regular intervals. It’s defined in a GitHub Actions workflow using a cron schedule expression— a sequence of 5 values that denote the minutes, hours, day of month, month, and day of week when a job should occur, specified according to UTC.

Let’s suppose we want to run a job at 09:30 BST (that’s UTC+01), on every weekday (Monday to Friday). We would specify this as:

30 8 * * 1-5

Let’s break that down:

30 8 at the start represents the minutes and hours, so the sheduled time is 08:30 UTC. If you’re working in a BST timezone, that’ll translate to 09:30.
* * means every day of the month and every month of the year, respectively.
1-5 represents the day of the week, where 1 is Monday and 7 is Sunday. So this represents every day from Monday to Friday.

The Crontab.guru website is useful for testing the meaning of a cron expression, or for checking you have constructed your own cron expression correctly.

GitHub Actions allows for multiple cron times to be specified, and it will run when any of the listed times are reached. And that’s a good thing, because the keen-eyed among you will have noticed the issue with the cron specification above: Daylight savings time.

Suppose we actually want to run it every weekday at 09:30 Europe/London time, which is a mixture of BST (UTC+01) between the last Sundays in March and October, and GMT (UTC+00). We can specify several cron expressions to cover different times across the year.

30 9 * 11-12,1-3 1-5' # 09:30 hours GMT from 1 Nov to 31 Mar.
30 8 25-31 3 1-5" # 09:30 hours BST from 25 Mar - 1 Apr.
30 8 * 4-10 1-5' # 09:30 hours BST from 1 Apr - 31 Oct.
30 9 25-31 10 1-5" # 09:30 hours GMT from 25 Oct - 1 Nov.

This strategy still isn’t perfect—for the last weeks in March and October, we essentially run the automated workflow twice, separated by an hour, because we can’t be sure which day daylight savings time changes.

To further complicate matters, despite our best efforts to ensure the job runs at 09:30 local time, when you’re using the shared resources of Github Actions, your job may have to wait in a queue for several minutes—or even hours—if it’s a particularly busy time for their servers. Got a mission-critical workflow that must run exactly on time? Then have the job performed by your own dedicated CI/CD runners.

How do I set up a workflow?

The method used will depend on what CI/CD runner you’ll be using. We’ll discuss a very basic workflow for an R user who has a shiny app they want to automatically deploy to shinyapps.io using Github Actions.

We’re going to start by creating a new Shiny app in RStudio, which will come initialised with a git repository and will use {renv}. The renv lockfile will already come supplied with the necessary packages needed to run the default “Old Faithful Geyser” app. We’ll also make sure we’ve deployed our app to GitHub.

Next we’ll need to generate an access token from shinyapps.io, which will allow GitHub Actions access to our account for the purposes of uploading the shiny apps.

Having logged into shinyapps.io, go to the Account → Tokens section of the menu. Click the button to “Add token”, and make a note of the Token and Secret values. For security reasons, the Secret will be hidden until you reveal it.

Now in GitHub, go to the repository’s settings and navigate to the Secrets → Actions menu. Create a new repository secret for each of the name, token and secret values taken from shinyapps.io.

When you’re done, you should have three secrets which you’ve named for use in GitHub Actions:

We’ll use an example template from r-lib actions which is made to provide a GitHub Actions workflow. This will perform a number of jobs: creating an ubuntu instance; pulling the latest version of your code from the main branch on GitHub; installing and preparing R, installing package dependencies from the renv lockfile, and then performing the necessary steps to deploy the application to GitHub Actions. We just need to edit a few lines specifying the APPNAME and SERVER, and store it in a new directory (in the GitHub repository’s root directory) of .github/workflows/.

# .github/workflows/shiny-deploy.yaml
on:
  push:
    branches: [main, master]

name: shiny-deploy

jobs:
  shiny-deploy:
    runs-on: ubuntu-latest
    env:
      GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
    steps:
      - uses: actions/checkout@v3

      - uses: r-lib/actions/setup-pandoc@v2

      - uses: r-lib/actions/setup-r@v2
        with:
          use-public-rspm: true

      - uses: r-lib/actions/setup-renv@v2

      - name: Install rsconnect
        run: install.packages("rsconnect")
        shell: Rscript {0}

      - name: Authorize and deploy app
        env: 
          # Provide your app name and deployment server below
          APPNAME: github-deployed-app
          SERVER: shinyapps.io
        run: |
          rsconnect::setAccountInfo("${{ secrets.SHINYAPPS_NAME }}",
                                    "${{ secrets.SHINYAPPS_TOKEN }}",
                                    "${{ secrets.SHINYAPPS_SECRET }}")
          rsconnect::deployApp(appName = "${{ env.APPNAME }}",
                               account = "${{ secrets.SHINYAPPS_NAME }}",
                               server = "${{ env.SERVER }}")
        shell: Rscript {0}

When we commit the new file and push the change to the default branch, GitHub will automatically run the workflow on their servers for us. We can see progress on the “Actions” page of the repository, where it will display whether a pipeline is currently running, or has finished with a pass or fail status. Details for a failing pipeline can be viewed by clicking on the failed pipeline and viewing the output generated during that workflow.

When the pipeline has succeeded, we can view the newly deployed app on shinyapps.io. The app’s deployment address will be of the format https://[USERNAME].shinyapps.io/[APPNAME], where [USERNAME] and [APPNAME] are replaced with the values used in the deployment .yaml file.

What’s the net result?

Creating the automated processes and workflows to manage the WHO/Europe COVID-19 Vaccine Programme Monitor for the WHO/Europe required an investment in time and money. But those costs over the short-term have generated long-term savings in terms of the maintenance and time required to manage their data processing and the hosting of the dashboard.

It’s important to note that not everything is done automatically for us. As is the way with real world data, there are always going to be a few data quality anomalies that mean members of WHO/Europe will prepare a small amount of the data themselves as part of the overall workflow. This is not necessarily a bad thing; there are many instances where fully automated systems have produced ludricous results when left to operate unsupervised, so maintaining a human touch can help keep things in check. But with 95% of the work being handled automatically, members from both WHO/Europe and Jumping Rivers are free to focus on other more important matters.

For the last few months, the app has mostly looked after itself in a reliable way. And for an automated process, there can be no higher praise.

For updates and revisions to this article, see the original post

To leave a comment for the author, please follow the link and comment on their blog: The Jumping Rivers Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Work smarter; not harder: COVID-19 processing for the WHO/Europe

Where can we delegate the tasks to?

The basic concept of an automated workflow

What do we automate?

Tests

Deployments

Data processing

How do I set up a workflow?

What’s the net result?

Related

Where can we delegate the tasks to?

The basic concept of an automated workflow

What do we automate?

Tests

Deployments

Data processing

How do I set up a workflow?

What’s the net result?

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)