Web Analytics Visualization through ggplot2

July 1, 2013

(This article was first published on Tatvic Blog » R, and kindly contributed to R-bloggers)

During our last href="http://www.tatvic.com/perform-predictive-analysis-on-your-web-analytics-tool/">webinar, we covered some of the basic ideas behind href="http://docs.ggplot2.org/">ggplot2, the R Visualization package by Dr. Hadley Wickham. In this blog post I will walk through the example that I covered during the webinar.

In order to carry out the examples yourself, you may download the dummy datasets from this href="http://www.tatvic.com/blog/downloads/dataset.zip">link

Creating visualizations is an iterative process. You start with a data set, generate some quick graphs that best depict the insights and keep on adding components/data to the graph to finally produce a viz that you can show in your reports. The idea behind ggplot2 is to make this process simpler and more effective at the same time.

Diving into our example, we want to explore how Transactions for a hypothetical ecommerce store have fared for a particular calendar year. The function which we used for plotting is ggplot() and we need to mention the data frame as one of the arguments. Now for a bit of ggplot2 terminology which I am quoting verbatim from the documentation.

  • aes stands for aesthetics and this controls how variables are mapped to the axis. In our example we map month to the x-axis and transactions to the y-axis.
  • geoms, short for geometric objects, describe the type of plot you will produce. In our case, we are plotting the data as a bar graph
  • stat stands for statistics which help us transform the data prior to plotting. In our case, its an identity transform so the data remains unchanged.

Here’s the R code :

# Load the dataframe
mydata <- read.csv("./datasets/dataset1.csv")
# Append a new column that maps month numbers to month names mydata$monthf <- factor(mydata$month,levels=as.character(1:12), labels=c("Jan","Feb","Mar","Apr","May","Jun", "Jul","Aug","Sep","Oct","Nov","Dec"), ordered=TRUE)
# Plot Transactions vs Month ggplot(mydata,aes(monthf,transactions)) + geom_bar(stat="identity")
# Which month shows the highest transactions ?

href="http://www.tatvic.com/blog/wp-content/uploads/2013/07/Rplot14.png"> class="aligncenter size-full wp-image-4279" title="Our first bar graph" src="http://www.tatvic.com/blog/wp-content/uploads/2013/07/Rplot14.png" alt="" width="500" height="367" />

You may now be on your way to follow the rest of the code and keep on improving our first visualization using the rest of the R code :

# Load data frame that includes Medium as a dimension
mydata_1 <- read.csv("./data/dataset2.csv")
mydata_1$monthf <- factor(mydata_1$month,levels=as.character(1:12),
# Facet the Transactions by medium ggplot(mydata_1,aes(monthf,transactions)) + geom_bar(stat="identity") + facet_wrap(~medium)
# What is the problem with this plot ?
# Exclude the mediums having zero transactions fresh_data <- subset(mydata_1,medium %in% c("cpc","organic","referral","(none)"))
# Re-plot ggplot(fresh_data,aes(monthf,transactions)) + geom_bar(stat="identity") + facet_wrap(~medium)
# Stack the plots vertically for easier comparison ggplot(fresh_data,aes(monthf,transactions)) + geom_bar(stat="identity") + facet_wrap(~medium,ncol=1) # Which medium performed best w.r.t transactions ?
# Load the data frame including an additional dimension Visitor Type mydata_2 <- read.csv("./data/dataset3.csv")
mydata_2$monthf <- factor(mydata_2$month,levels=as.character(1:12), labels=c("Jan","Feb","Mar","Apr","May","Jun", "Jul","Aug","Sep","Oct","Nov","Dec"), ordered=TRUE)
# Map a color to Visitor Type Variable ggplot(mydata_2,aes(monthf,transactions,fill=visitorType)) + geom_bar(stat="identity") + facet_wrap(~medium,ncol=1)
# Stack the bar graphs side by side for easier comparison ggplot(mydata_2,aes(monthf,transactions,fill=visitorType)) + geom_bar(stat="identity",position="dodge") + facet_wrap(~medium,ncol=1)
# Strip the grey background and add a plot title ggplot(mydata_2,aes(monthf,transactions,fill=visitorType)) + geom_bar(stat="identity",position="dodge") + facet_wrap(~medium,ncol=1) + theme_bw() + ggtitle("MoM transactions split by Visitor Type")

If you followed the code correctly, you might end up with something like this: href="http://www.tatvic.com/blog/wp-content/uploads/2013/07/Rplot13.png"> class="aligncenter size-full wp-image-4280" title="Final viz" src="http://www.tatvic.com/blog/wp-content/uploads/2013/07/Rplot13.png" alt="" width="500" height="367" />

There is a lot that can be still improved with the viz but let us stop here and quickly sum up what we just learnt. We understood the basic idea behind ggplot2, gained some knowledge about its terminology and saw how we could generate interesting visualizations in a matter of minutes. Of course, there is a fair bit of programming overhead involved but once you get the hang of ggplot2, it is time well spent in learning to code. If you’re in for something advanced you may want to have a look at our other blog posts on ggplot2 href="http://www.tatvic.com/blog/category/r/ggplot2/">here

class="wp-about-author-containter-top" style="background-color:#FFEAA8;"> class="wp-about-author-pic"> src="http://www.tatvic.com/blog/wp-content/uploads/userphoto/20.jpg" alt="Kushan Shah" width="60" class="photo" />

href='http://www.tatvic.com/blog/author/kushan/' title='Kushan Shah'>Kushan Shah

Kushan is a Web Analyst at Tatvic. His interests lie in getting the maximum insights out of raw data using R and Python.

href='http://www.tatvic.com/' title='Kushan Shah'>Website – href='http://twitter.com/kushan_s' title='Kushan Shah on Twitter' rel='nofollow'>Twitter – href='https://www.facebook.com/kushan.shah1?ref=tn_tnmn' title='Kushan Shah on Facebook' rel='nofollow'>Facebook – href='http://www.tatvic.com/blog/author/kushan/' title='More posts by Kushan Shah'>More Posts

align="right" style="float: right; clear:left; padding: 0px 5px 0px 7px;"> name="fb_share" type="box_count" share_url="http://www.tatvic.com/blog/web-analytics-visualization-through-ggplot2/">

To leave a comment for the author, please follow the link and comment on his blog: Tatvic Blog » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Top 3 Posts from the past 2 days

Top 9 articles of the week

  1. Scatterplots
  2. In-depth introduction to machine learning in 15 hours of expert videos
  3. The Single Most Important Skill for a Data Scientist
  4. Installing R packages
  5. Illustrated Guide to ROC and AUC
  6. Network analysis with igraph
  7. Using apply, sapply, lapply in R
  8. R vs Python: Survival Analysis with Plotly
  9. KDD Cup 2015: The story of how I built hundreds of predictive models….And got so close, yet so far away from 1st place!