As data analytics consulting firm, we think we are fortunate that we keep finding problems to find. Recently my team mate found a glaring problem of not having any connector for R with Google. With the inspiration from Michael, Ajay O, it soon become a worth problem to solve.
With title="RGoogleAnalytics" href="https://code.google.com/p/r-google-analytics/" >RGoogleAnalytics package now, we have solved the problem of data extraction into R from Google Analytics a new breed of ideas started emerging primarily around visualization. I have been playing with title="GGplot2" href="http://cran.r-project.org/web/packages/ggplot2/index.html" >GGplot2 has been great package to convert data into visualization. Thanks Dr. Hadley Wickham. Once you have Following this title="Google Analytics data extraction in R" href="http://www.tatvic.com/blog/ga-data-extraction-in-r/" >blogpost, you are with the code there in position to have data required to get these calendar heat map done. Take up below given code and paste into R console and play around to see if you find it easy working thru. If you have trouble, feel free to reach out to title="Contact Us" href="http://www.tatvic.com/contact/?ref=blogmenu" >us.
Here is the code for extracting the Google analytics data using R-google analytics package. Before running the following code, download title="RGoogleAnalytics" href="https://code.google.com/p/r-google-analytics/downloads/list" >RGoogleAnalytics package and install it.
#Load RGoogleAnalytics library library("RGoogleAnalytics") # Create query builder object query <- QueryBuilder() # Authorize your account and paste the accesstoken access_token <- query$authorize() # Create a new Google Analytics API object ga <- RGoogleAnalytics() ga.profiles <- ga$GetProfileData(access_token) # List the GA profiles ga.profiles # select index corresponds to your profile and set it to query string # For example if index is 7 of your GA profile then set ga.profiles$id in # query$Init() method given below # Build the query string query$Init(start.date = "2010-01-01", # Set start date end.date = "2012-12-31", # Set end date dimensions = "ga:date", metrics = "ga:visits,ga:transactions", max.results = 10000, table.id = paste("ga:",ga.profiles$id,sep="",collapse=","), access_token=access_token) # Make a request to get the data from the API ga.data <- ga$GetReportData(query) # data will be stored in this data frame # Set date in format YYYY-MM-DD (to use into heatmap calender) ga.data$date <- as.Date(as.character(ga.data$date),format="%Y%m%d")
For this example of Calender heatmap, I am using data of an e-commerce store with having data for more than 2 years in business. I will be plotting visits as well as transactions on calendar so that I’d get perspective on how they interact viz-a-viz timeline.
Here is the code for plotting the heat map after you get data and have it store in 'data'. This frame is used to reference the source of data for the visualization below.
# Recommended R version - 2.15.1 or higher # install required library by using the command install.packages(“libraryname”) # For example install.packages(“ggplot2”) # Required library library(“quantmod”) library(“ggplot2”) library(“reshape2”) library(“plyr”) library(“scales”) # Set extracted data to this data frame data <- ga.data # Run commands listed below data$year <- as.numeric(as.POSIXlt(data$date)$year+1900) data$month <- as.numeric(as.POSIXlt(data$date)$mon+1) data$monthf <- factor(data$month,levels=as.character(1:12), labels=c("Jan","Feb","Mar","Apr","May","Jun", "Jul","Aug","Sep","Oct","Nov","Dec"), ordered=TRUE) data$weekday <- as.POSIXlt(data$date)$wday data$weekday[data$weekday==0] <- 7 data$weekdayf <- factor(data$weekday,levels=rev(1:7), labels=rev(c("Mon","Tue","Wed","Thu","Fri","Sat","Sun")), ordered=TRUE) data$yearmonth <- as.yearmon(data$date) data$yearmonthf <- factor(data$yearmonth) data$week <- as.numeric(format(as.Date(data$date),"%W")) data <- ddply(data,.(yearmonthf),transform,monthweek=1+week-min(week)) # Plot for visits P_visits <- ggplot(data, aes(monthweek, weekdayf, fill = visits)) + geom_tile(colour = "white") + facet_grid(year~monthf) + scale_fill_gradient(high="#D61818",low="#B5E384") + labs(title = "Time-Series Calendar Heatmap") + xlab("Week of Month") + ylab("") # View plot P_visits #Plot for transactions P_transactions <- ggplot(data, aes(monthweek, weekdayf, fill = transactions)) + geom_tile(colour = "white") + facet_grid(year~monthf) + scale_fill_gradient(high="#D61818",low="#B5E384") + labs(title = "Time-Series Calendar Heatmap") + xlab("Week of Month") + ylab("") # View plot P_transactions
Once you run the code, you will be in position to get output like below:
/> href="http://www.tatvic.com/blog/wp-content/uploads/2013/03/visits3.png"> src="http://www.tatvic.com/blog/wp-content/uploads/2013/03/visits3.png" alt="" title="Visits" width="650" height="275" class="alignnone size-full wp-image-4001" />
/> Now that we have a calendar heat map for visits, let me pull it off for transaction. In the above code for Google Analytics data extraction you have use transaction as well as visits as metrics. Since the data is already available in the ‘data’. we are ready by changing in code of visualization to choose the heat map for transaction now.
/> href="http://www.tatvic.com/blog/wp-content/uploads/2013/03/Transactions.png"> src="http://www.tatvic.com/blog/wp-content/uploads/2013/03/Transactions.png" alt="" title="Transactions" width="650" height="275" class="alignnone size-full wp-image-4003" />
/> Its quite interesting now that you can make super nice inferences like I did below:
- Tuesdays have high visits days but wed has been the day when most transactions occurs
- Visits increases towards the end of year (shopping season) and then slows down towards year start
Visualization is an interactive process. Based on the feedback received from some of our readers, I tried plotting both the KPIs on the same graph. With ggplot2, it was as simple as adding a line of code. Again some minor tweaks in the background colours and we are ready with another heat map.
ggplot(data, aes(monthweek, weekdayf, fill = visits)) + geom_tile(colour="white") + facet_grid(year~monthf) + scale_fill_gradient(high="steelblue",low="white") + theme_bw() + geom_point(aes(monthweek, weekdayf, size=transactions,alpha=transactions),color="firebrick") + theme(panel.grid.minor=element_blank(), panel.grid.major=element_blank()) + labs(title = "Time-Series Calendar Heatmap") + xlab("Week of Month") + ylab("")
src="http://www.tatvic.com/blog/wp-content/uploads/2013/03/visit_trans.png" alt="" title="visits&transactions" width="700" height="500" class="alignnone size-full wp-image-4123" />