Google analytics data extraction in R

December 3, 2012
By

(This article was first published on Tatvic Blog » R, and kindly contributed to R-bloggers)

Unlike other posts on this blog this particular post is more focused on coding using R so audience with the developer mindset would like it more than pure business analysts.

My goal is to describe an alternate method to use to extract the data from Google Analytics via API into R. I have been using R from quite a some time but I think the GA library for R has been broken and while they did make an update, it’s sort of not being used right now.

Considering this, I thought to write it down by myself on move on as more data related operation are now being done using R.

Moreover, the Rgoogleaanalytics package that is available is built for linux only and my windows friends may just like me having something for them as well.

Ok so lets get started, it’s going to very quick and easy.

There are some prerequisites for GA Data extraction in R:

type="a">
  • At least one domain must be registered with your href="http://www.google.com/analytics/">Google analytics account
  • R installed with the following the Googleng packages
    • Rcurl ( href="http://cran.r-project.org/web/packages/RCurl/index.html">download by clicking here)
    • rjson ( href="http://cran.r-project.org/web/packages/rjson/index.html">download by clicking here)

    Steps to be followed for Google analytics data extraction in R :

    type="1">
  • Set the Google analytics query parameters for preparing the request URI
  • To extract the Google analytics data, first you need to define the query parameters like href="https://developers.google.com/analytics/devguides/reporting/core/dimsmets">dimensions, href="https://developers.google.com/analytics/devguides/reporting/core/dimsmets">metrics, startdate, enddate, sort and filters as per your requirement.

    # Defining Google analytics search query parameters
    # Set the dimensions and metrics
    ga_dimensions <- 'ga:visitorType,ga:operatingSystem,ga:country'
    ga_matrics <- 'ga:visits,ga:bounces,ga:avgTimeOnSite'
    # Set the starting and ending date
    startdate <- '2012-01-01'
    enddate <- '2012-11-30'
    # Set the segment, sort and filters
    segment <- 'dynamic::ga:operatingSystem==Android'
    sort <- 'ga:visits'
    filters <- 'ga:visits>2'
    
    type="1" start="2">
  • Get the access token from Oauth 2.0 Playground
  • We will obtain the access token from Oauth 2.0 Playground. Following are the steps for generating the access token.

    type="a">
  • Go to href="https://developers.google.com/oauthplayground/">Oauth 2.0 Playground
  • Select Analytics API and click on the Authorize APIs button with providing your related account credentials
  • Generate the access token by clicking on the Exchange authorization code for tokens and set it to the access token variable in the provided R script
  • href="http://www.tatvic.com/blog/wp-content/uploads/2012/12/oauth2.png" > class="alignnone wp-image-3853" src="http://www.tatvic.com/blog/wp-content/uploads/2012/12/oauth2.png" alt="" width="625" height="250" />

    type="1" start="3">
  • Retrieve and select the Profile
  • From the below, you can retrieve the number of the profiles which registered under your Google Analytics account. With this you can have the related GA profile id. Before retrieving profiles ensure that access token is present.

    We can retrieve the profile by requesting to href="https://developers.google.com/analytics/devguides/config/mgmt/v3/">Management API with accesstoken as a parameter, it will return the JSON response. Here are the steps to convert the response to the list and store it in to the data frame object profiles.

    # For requesting the GA profiles and store the JSON response in to GA.profiles.Json variable
    GA.profiles.Json <- getURL(paste("https://www.googleapis.com/analytics/v3/management/accounts/~all/webproperties/~all/profiles?access_token=",access_token, sep="", collapse = ","))
    # To convert resonse variable GA.profiles.Json to list
    GA.profiles.List <- fromJSON(GA.profiles.Json, method='C')
    # Storing the profile id and name to profile.id and profile.name variable
    GA.profiles.param <- t(sapply(GA.profiles.List$items,
                                  '[', 1:max(sapply(GA.profiles.List$items, length))))
    profiles.id <- as.character(GA.profiles.param[, 1])
    profiles.name <- as.character(GA.profiles.param[, 7])
    # Storing the profile.id and profile.name to profiles data.frame
    profiles <- data.frame(id=profiles.id,name=profiles.name)

    We have stored the profiles information in profiles data frame with profile id and profile name. We can print the retrieved list by following code

    profiles
    OUTPUT::
             id       name
    1 ga:123456    abc.com
    2 ga:234567    xyz.com

    At a time we can retrieve the Google analytics data from only one GA profile. so we need to define the profile id for which we want to retrieve the GA data. You can select the related profile id from the above output and store it in to profileid variable to be later used in the code.

    # Set your google analytics profile id
    profileid <- 'ga:123456'
    type="1" start="4">
  • Retrieving GA data
  • Requesting the Google analytics data to href="https://developers.google.com/analytics/devguides/reporting/core/v3/reference">Google analytics data feed API with access token and all of the query parameters defined as dimensions, metrics, start date, end date, sort and filters.

    # Request URI for querying the Google analytics Data
    GA.Data <- getURL(paste('https://www.googleapis.com/analytics/v3/data/ga?',
                            'ids=',profileid,
    			'&dimensions=',ga_dimensions,
                            '&metrics=',ga_matrics,
    			'&start-date=',startdate,
                            '&end-date=',enddate,
                            '&segment=',segment,
    			'&sort=',sort,
    			'&filters=',filters,
                            '&max-results=',10000,
                            '&start-index=',start_index*10000+1,
                            '&access_token=',accesstoken, sep='', collapse=''))

    This request returns a response body with the JSON structure. Therefore to interpret these response values we need to convert it to list object.

    # For converting the Json data to list object GA.list
    GA.list <- fromJSON(GA.Data, method='C')

    Now its easy to get the response parameters from this list object. So, the total number of the data rows will be obtained by the following command

    # For getting the total number of the data rows
    totalrow <-  GA.list$totalResults
    type="1" start="5">
  • Storing GA data in Data frame
  • Storing the Google analytics response data in R dataframe object which is more appropriate to data visualization and data modeling in R

    # Splitting the ga_matrics to vectors
    metrics_vec <- unlist(strsplit(ga_matrics,split=','))
    # Splitting the ga_dimensions to vectors
    dimension_vec <-unlist(strsplit(ga_dimensions,split=','))
    # To splitting the columns name from string object(dimension_vec)
    ColnamesDimension <- gsub('ga:','',dimension_vec)
    # To splitting the columns name from string object(metrics_vec)
    ColnamesMetric <- gsub('ga:','',metrics_vec)
    # Combining dimension and metric column names to col_names
    col_names <- c(ColnamesDimension,ColnamesMetric)
    colnames(finalres) <- col_names
    # To convert the object GArows to dataframe type
    GA.DF <- as.data.frame(finalres)

    Finally the retrieved data is stored in GA.DF dataframe. You can chek it’s top data by the following command

    head(GA.DF)
    OUTPUT::
            visitorType operatingSystem   country visits bounces      avgTimeOnSite
    1       New Visitor         Android Australia      3       1              106.0
    2       New Visitor         Android   Belgium      3       1 155.33333333333334
    3       New Visitor         Android    Poland      3       0               60.0
    4       New Visitor         Android    Serbia      3       2 40.666666666666664
    5       New Visitor         Android     Spain      3       1               43.0
    6 Returning Visitor         Android (not set)      3       3                0.0

    You will need this full R script for trying this yourself, You can download this script by href="http://www.tatvic.com/blog/downloads/rgoogleanalytics.zip">clicking here. Currently I am working on development of R package, which will help R users to do the same task with less effort. If anyone among you is interested provide your email id in comment, we’ll get in touch. /> style="color:#2361A1">Would you like to understand the value of predictive analysis when applied on web analytics data to help improve your understanding relationship between different variables? We think you may like to watch our Webinar – How to perform predictive analysis on your web analytics tool data. href="http://www.tatvic.com/perform-predictive-analysis-on-your-web-analytics-tool/?utm_source=post&utm_medium=blog&%23038;utm_campaign=webinar3" >Watch the Replay now!

    class="wp-about-author-containter-top" style="background-color:#FFEAA8;"> class="wp-about-author-pic"> src="http://www.tatvic.com/blog/wp-content/uploads/userphoto/15.jpg" alt="Vignesh Prajapati" width="60" class="photo" />
    class="wp-about-author-text">

    href='http://www.tatvic.com/blog/author/vignesh/' title='Vignesh Prajapati'>Vignesh Prajapati

    Vignesh is Data Engineer at Tatvic. He loves to play with opensource playground to make predictive solution on Big data with R, Hadoop and Google Prediction API.
    Google Plus profile: href="https://plus.google.com/118111147659915240293/posts">Vignesh Prajapati

    align="right" style="float: right; clear:left; padding: 0px 5px 0px 7px;"> name="fb_share" type="box_count" share_url="http://www.tatvic.com/blog/ga-data-extraction-in-r/">

    To leave a comment for the author, please follow the link and comment on his blog: Tatvic Blog » R.

    R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



    If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

    Comments are closed.

    Top 3 Posts from the past 2 days

    Top 9 articles of the week

    1. Scatterplots
    2. In-depth introduction to machine learning in 15 hours of expert videos
    3. The Single Most Important Skill for a Data Scientist
    4. Installing R packages
    5. Illustrated Guide to ROC and AUC
    6. Network analysis with igraph
    7. Using apply, sapply, lapply in R
    8. R vs Python: Survival Analysis with Plotly
    9. KDD Cup 2015: The story of how I built hundreds of predictive models….And got so close, yet so far away from 1st place!

    Sponsors