WordPress Stats in R

[This article was first published on OUseful.Info, the blog... » Rstats, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A trackback from Martin Hawksey’s recent post on Analysing WordPress post velocity and momentum stats with Google Sheets (Spreadsheet), which demonstrates how to pull WordPress stats into a Google Spreadsheet and generates charts and reports therein, reminded me of the WordPress stats API.

So here’s a quick function for pulling WordPress reports into R.

#Wordpress Stats
##---------------
#Wordpress Stats API docs (from http://stats.wordpress.com/csv.php)

#You can get a copy of your API key (required) from Akismet:
#Login with you WordPress account: http://akismet.com/account/
#Resend API key: https://akismet.com/resend/

#Required parameters: api_key, blog_id or blog_uri.
#Optional parameters: table, post_id, end, days, limit, summarize.

#Parameters:
#api_key     String    A secret unique to your WordPress.com user account.
#blog_id     Integer   The number that identifies your blog. Find it in other stats URLs.
#blog_uri    String    The full URL to the root directory of your blog. Including the full path.
#table       String    One of views, postviews, referrers, referrers_grouped, searchterms, clicks, videoplays.
#post_id     Integer   For use with postviews table.
#end         String    The last day of the desired time frame. Format is 'Y-m-d' (e.g. 2007-05-01) and default is UTC date.
#days        Integer   The length of the desired time frame. Default is 30. "-1" means unlimited.
#period      String    For use with views table and the 'days' parameter. The desired time period grouping. 'week' or 'month'
#Use 'days' as the number of results to return (e.g. '&period=week&days=12' to return 12 weeks)
#limit       Integer   The maximum number of records to return. Default is 100. "-1" means unlimited. If days is -1, limit is capped at 500.
#summarize   Flag      If present, summarizes all matching records.
#format      String    The format the data is returned in, 'csv', 'xml' or 'json'. Default is 'csv'.
##---------------------------------------------
#NOTE: some of the report calls I tried didn't seem to work properly?
#Need to build up a list of tested calls to the API that actually do what you think they should?
##-----

wordpress.getstats.demo=function(apikey, blogurl, table='postviews', end=Sys.Date(), days='12', period='week', limit='', summarise=''){
  #default parameters gets back last 12 weeks of postviews aggregated by week
  url=paste('http://stats.wordpress.com/csv.php?',
    'api_key=',apikey,
    '&blog_uri=',blogurl,
    '&table=',table,
    '&end=',end,
    '&days=',days,
    '&period=',period,
    '&limit=',limit,
    '&',summarise, #set this to 'summarise=T' if required
    sep=''
  )
  #Martin's post notes that JSON appears to work better than CSV
  #May be worth doing a JSON parsing version?
  read.csv(url)
}


APIKEY='YOUR-API_KEY_HERE'
#Use the URL of a WordPress blog associated with the same account as the API key
BLOGURL='http://ouseful.wordpress.com'

#Examples
wp.pageviews.last12weeks=wordpress.getstats.demo(APIKEY,BLOGURL)
wp.views.last12weeks.byweek=wordpress.getstats.demo(APIKEY,BLOGURL,'views')
wp.views.last30days.byday=wordpress.getstats.demo(APIKEY,BLOGURL,'views',days=30,period='')
wp.clicks.wpdefault=wordpress.getstats.demo(APIKEY,BLOGURL,'clicks',days='',period='')
wp.clicks.lastday=wordpress.getstats.demo(APIKEY,BLOGURL,'clicks',days='1',period='')
wp.referrers.lastday=wordpress.getstats.demo(APIKEY,BLOGURL,'referrers',days='1',period='')


require(stringr)
getDomain=function(url) str_match(url, "^http[s]?://([^/]*)/.*?")[, 2]

#We can pull out the domains clicks were sent to or referrals came from
wp.clicks.lastday$domain=getDomain(wp.clicks.lastday$click)
wp.referrers.lastday$domain=getDomain(wp.referrers.lastday$referrer)

require(ggplot2)

#Scruffy bar chart - is there a way of doing this sorted chart using geom_bar? How would we reorder x?
c=as.data.frame(table(wp.clicks.yesterday$domain))
ggplot(c)+geom_bar(aes(x=reorder(Var1,Freq),y=Freq),stat='identity')+theme( axis.text.x=element_text(angle=-90))

c=as.data.frame(table(wp.referrers.lastday$domain))
ggplot(c)+geom_bar(aes(x=reorder(Var1,Freq),y=Freq),stat='identity')+theme( axis.text.x=element_text(angle=-90))

(Code as a gist.)

I guess there’s scope for coming up with a set of child functions that pull back specific report types? Also, if we pull in the blog XML archive and extract external links from each page, we could maybe start to analyse we pages are sending traffic where? (Of course, you can use Google Analytics to do this more efficiently, for hosted WordPress blogs don’t support Google Analytics (for no very good reason that I can tell…?)

PS for more WordPress tinkerings, see eg How OUseful.Info Posts Link to Each Other…,which links to a Python script for extracting data from WordPress blog export files that show how blogs posts in a particular WordPress blog link to each other.


To leave a comment for the author, please follow the link and comment on their blog: OUseful.Info, the blog... » Rstats.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)