How to extract Google Analytics data in R using RGoogleAnalytics

November 5, 2014
By

(This article was first published on Tatvic Blog » R, and kindly contributed to R-bloggers)

I am extremely thrilled to announce that RGoogleAnalytics was released recently by CRAN. R is already a swiss army knife for data analysis largely due its 6000 libraries. What this means is that digital analysts can now fully use the analytical capabilities of R to fully explore their Google Analytics Data. In this post, we will go through the basics of RGoogleAnalytics. Let’s begin

Fire up your favorite R IDE and install RGoogleAnalytics. Installation is pretty basic. In case, you are new to RGoogleAnalytics, refer this post to learn how to install it.

Since RGoogleAnalytics uses the Google Analytics Core Reporting API under the hood, every request to the API has to be authorized under the OAuth2.0 protocol. This requires an initial setup in terms of registering an app with the Google Analytics API so that you get a unique set of project credentials (Client ID and Client Secret). Here’s how to do this –

  • Navigate to Google Developers Console
  • Create a New Project and Open it
  • Navigate to APIs and ensure that the Analytics API is turned On for your project
  • Navigate to Credentials and create a New Client ID
  • Select Application Type – Installed Application

credentials

  • Once your Client ID and Client Secret are created, copy them to your R Script.
OAuth
Once the project is configured and the credentials set ready, we need to authenticate your Google Analytics Account with your app. This ensures that your app (R Script) can access your Google Analytics data/List your Google Analytics profiles and so on. Once authenticated you get a pair of tokens (Access Token and Refresh Token). An Access Token is appended with each API request so that Google’s servers know that the requests came from your app and they are authentic. Access Tokens expire after 60 minutes so they need to be regenerated using the Refresh Token. I will show you how to do that but prior to that, lets continue the data extraction flow.
require(RGoogleAnalytics)

# Authorize the Google Analytics account
# This need not be executed in every session once the token object is created 
# and saved
client.id <- "xxxxxxxxxxxxxxxxxxxxxxxxx.apps.googleusercontent.com"
client.secret <- "xxxxxxxxxxxxxxxd_TknUI"
token <- Auth(client.id,client.secret)

# Save the token object for future sessions
save(token,file="./token_file")

The next step is to get the Profile ID/View ID of the Google Analytics profile for which the data extraction is to be carried out. It can be found within the Admin Panel of the Google Analytics UI. This profile ID maps to the table.id argument below.

The code below generates a query with the Standard Query Parameters – Start Date, End Date, Dimensions, Metrics etc. and hits the query to the Google Analytics API. The API response is converted in the form of a R DataFrame.

# Get the Sessions & Transactions for each Source/Medium sorted in 
# descending order by the Transactions

query.list <- Init(start.date = "2014-08-01",
                   end.date = "2014-09-01",
                   dimensions = "ga:sourceMedium",
                   metrics = "ga:sessions,ga:transactions",
                   max.results = 10000,
                   sort = "-ga:transactions",
                   table.id = "ga:123456")

# Create the Query Builder object so that the query parameters are validated
ga.query <- QueryBuilder(query.list)

# Extract the data and store it in a data-frame
ga.data <- GetReportData(ga.query, token)

# Sanity Check for column names
dimnames(ga.data)

# Check the size of the API Response
dim(ga.data)

In future sessions, you need not generate the Access Token every time. Assumming that you have saved it to a file, it can be loaded via the following snippet –

load("./token_file")

# Validate and refresh the token
ValidateToken(token)

Here are a few practices that you might find useful –

  • Before querying for a set of dimensions and metrics, you might want to check whether they are compatible. This can be done using the Dimensions and Metrics Explorer
  • The Query Feed Explorer lets you try out different queries in the browser and you can then copy the query parameters to your R Script. It can be found here. I have found this to be a huge time-saver for debugging failed queries
  • In case if the API returns an error, here’s a guide to understanding the cryptic error responses.

Did you find RGoogleAnalytics useful? Please leave your comments below. In case if you have a feature request or want to file a bug please use this link.

Kushan Shah

Kushan Shah

Kushan is a Web Analyst at Tatvic. His interests lie in getting the maximum insights out of raw data using R and Python. He is also the maintainer of the RGoogleAnalytics library. When not analyzing data, he reads Hacker News.
Google+

More PostsWebsite

Follow Me:
TwitterFacebook

The post How to extract Google Analytics data in R using RGoogleAnalytics appeared first on Tatvic Blog.

To leave a comment for the author, please follow the link and comment on their blog: Tatvic Blog » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)