Using googlesheets and mailR packages in R to automate reporting

December 17, 2016
By

(This article was first published on Reimagined Invention, and kindly contributed to R-bloggers)

Most companies make extensive use of spreadsheets to store, work and share data internally.

For years, Excel has been the main software that allowed people to share internal data. But using Excel for reporting has its drawbacks:

  1. First manually updating data into spreadsheets is time-consuming and takes a large amount of analysts’ time. Before automating much of the reporting I am responsible for, I was often spending well over 30% of my time pulling data from our database, inputting the updated dataset in a spreadsheet and wait until all computations were done.
  2. Manual reporting can lead to errors. Shifting the inputted data by one cell will yield wrong results and will trigger angry emails from decision-makers!
  3. Excel files are shared via emails as attached files. It is not uncommon for multiple versions of the same files to be shared in the organization. This can lead to confusions, miscommunication and loss in efficiency.

With its Google docs suite, Google has solved point 3) by giving anyone the ability to create online collaborative spreadsheets for teams to work on simultaneously, for colleagues to give comments and communicate with one another in one place without having to email different mark-ups back and forth.

This post is about providing solutions to points 1) and 2) to help you reduce the amount of time you spend on reporting by showing a general procedure with mtcars dataset. This tutorial is based on what Pauline Glikman posted on Medium.com. Her tutorial is very good but I changed some parts here because I had some security concerns with her procedure and I also needed to divide my file (a real life spreadsheet, not mtcars!) in parts before uploading it.

R allows for easy automation of the reporting process. This means that the analyst’s task of pulling data, making some computations, uploading it in a nice spreadsheets and emailing it to relevant business stakeholders can be completely automated in a single script. And because with the same script and the same dataset, you will confidently obtain the same reproducible results you will not have to worry about making errors anymore. So whether you are currently running a daily reports showing sales or service level, this post can help you improve your productivity.

Installing the needed R packages

Before we get started, and assuming you have already installed R and Rstudio on your computer, let’s install everything we need:

  • googlesheets package
  • rJava package
  • mailR package
  • dplyr package
# Define the libraries to use
libraries <- c("googlesheets","rJava", "mailR", "dplyr")

# This is the function to download and or load libraries on the fly
download_and_or_load <- function(pkg){
  new.pkg <- pkg[!(pkg %in% installed.packages()[, "Package"])]
  if (length(new.pkg))
    install.packages(new.pkg, dependencies = TRUE)
  sapply(pkg, require, character.only = TRUE)
}

# Use the function from the last step
download_and_or_load(libraries)

Building your report with googlesheets package

For the purpose of this post, we will use this spreadsheet. Make a copy of it to your drive, we’ll use the url to your spreadsheet copy.

To define the spreadsheet and tab to work with and authorize googlesheets to view and manage your files, you need to type:

# Spreadsheet to work with
url <- "your_spreadsheet_url"
tab <- "Tab 1"

# Google's session authentication 
gs_auth()

You will be directed to a web browser, asked to sign in to your Google account, and to grant googlesheets permission to operate on your behalf with Google Sheets and Google Drive.

Now, let’s assume that you have been asked to create a daily report in a google spreadsheet. The first thing you need to do is to get some data. This may be via the web, via an API or a database. At Airbnb, our internal R package called Rbnb allows anyone in the company to quickly access our database without leaving the friendly environment of Rstudio.

For the purpose of this post we will be using the mtcars dataset and we will be pretending that the report you need to update daily shows the top 10 cars with the highest mpg.

We can select the top 10 cars in the following way:

# Select 10 cars with highest mgp in order and store it in a dataframe called data
data <- head(mtcars[with(mtcars, order(-mpg)), ], 10)

You now have some data to upload to your spreadsheet. To do so you will first load your spreadsheet with either the gs_title() or the gs_url() function. It is probably a good practice to use gs_url(url) in case the name of your spreadsheet changes over time.

You can then use gs_edit_cells() and specify the worksheet you want to update with the ws argument (here’s the part when we need dplyr):

# To edit the cells of your spreadsheet with your newly pulled data use gs_edit_cells and specify the work 
# sheet with the ws argument
top_10_cars <- top_10_cars %>%
  gs_edit_cells(ws = tab, input = data)

Done! You can now run your script every single day and it will update your sheet faster than loading your browser!

mailR: Updating your stakeholder with an automated email

Once your report is updated, you might want to send an email to the people looking at your dashboard. If you are running this script daily, opening your gmail and drafting an email seems like a waste of time.

Don’t worry, mailR has your back!

First follow this Google’s instructions to add a specific app password. This will generate a random password that you can just use for mailR, allowing you to save it in a script without revealing your actual credentials. You can also revoke this password at any time from your Google accounts settings page. Be careful, however — if this password is compromised, it can be used to fully access your account from anywhere! Consequently, make sure you don’t share a script with your password in it with anyone else.

Add this line to your .Rprofile for not exposing your app password:

options(GoogleDocsPassword = c(login = '16 characters password'))

And then add these lines at the end of your script and observe the magic:

# Write the content of your email
msg <- paste("Hi",
             "\r\n\r\nThis is my last report with date", as.character(date()),
             "\r\n\r\nPlease see this link:", url,
             "\r\n\r\nRegards",
             "\r\n\r\nJohn Doe")

# Define who the sender is
sender <- "[email protected]"

# Define who should get your email
recipients <- c("[email protected]",
                "[email protected]",
                "[email protected]")

# Send your email with the send.mail function
send.mail(from = sender,
          to = recipients,
          subject = "My report",
          body = msg,
          smtp = list(host.name = "smtp.gmail.com", port = 587,
                      user.name = "[email protected]",
                      passwd = getOption('GoogleDocsPassword')[[1]], ssl = TRUE),
          authenticate = TRUE,
          send = TRUE)

Appendix : One more thing on googlesheets

There is one major limitation with the googlesheets package is that it relies on a Google API which is apparently slow (see this). As a result, uploading large datasets to your spreadsheet will not work in one chunk.
One fix I have found and been using is to batch my data and load it in the spreadsheet in chunks.
You can use this code to chunk your data:

# Chunk your data in batches of N rows to ease the 
N <- 5 # you can change this according to what you need
data_in_chunks = list()
batches = round(nrow(data)/N)
for(i in 1:batches){
  data_in_chunks[[i]] =  data[(((i-1)*N)+1):(i*N),]
}

# Then upload your batches with a for loop
for(i in 1:(length(data_in_chunks)-2)){
  row = paste("A",as.character(2+(i*N)), sep='')
  top_10_cars <- top_10_cars %>% gs_edit_cells(ws = tab, input = data_in_chunks[[i+1]], anchor=row, col_names = FALSE, verbose = TRUE)
}

To leave a comment for the author, please follow the link and comment on their blog: Reimagined Invention.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)