Roll Your Own Gist Comments Notifier in R

July 25, 2015
By

(This article was first published on rud.is » R, and kindly contributed to R-bloggers)

As I was putting together the coord_proj ggplot2 extension I had posted a gist that I shared on Twitter. Said gist received a comment (several, in fact) and a bunch of us were painfully reminded of the fact that there is no built-in way to receive notifications from said comment activity.

@jennybryan posited that it could be possible to use IFTTT as a broker for these notifications, but after some checking that ended up not being directly doable since there are no “gist comment” triggers to act upon in IFTTT.

There are a few standalone Ruby gems that programmatically retrieve gist comments but I wasn’t interested in managing a Ruby workflow [ugh]. I did find a Heroku-hosted service – https://gh-rss.herokuapp.com/ – that will turn gist comments into an RSS/Atom feed (based on Ruby again). I gave it a shot and hooked it up to IFTTT but my feed is far enough down on the food chain there that it never gets updated. It was possible to deploy that app on my own Heroku instance, but—again—I’m not interested in managing a Ruby workflow.

The Ruby scripts pretty much:

  • grab your main gist RSS/Atom feed
  • visit each gist in the feed
  • extract comments & comment metadata from them (if any)
  • return a composite data structure you can do anything with

That’s super-easy to duplicate in R, so I decided to build a small R script that does all that and generates an RSS/Atom file which I added to my Feedly feeds (I’m pretty much always scanning RSS, so really didn’t need the IFTTT notification setup). I put it into a cron job that runs every hour. When Feedly refreshes the feed, a new entry will appear whenever there’s a new comment.

The script is below and on github (ironically as a gist). Here’s what you’ll grok from the code:

  • one way to deal with the “default namespace” issue in R+XML
  • one way to deal with error checking for scraping
  • how to build an XML file (and, specifically, an RSS/Atom feed) with R
  • how to escape XML entities with R
  • how to get an XML object as a character string in R

You’ll definitely need to tweak this a bit for your own setup, but it should be a fairly complete starting point for you to work from. To see the output, grab the generated feed.

# Roll your own GitHub Gist Comments Feed in R
 
library(xml2)    # github version
library(rvest)   # github version
library(stringr) # for str_trim & str_replace
library(dplyr)   # for data_frame & bind_rows
library(pbapply) # free progress bars for everyone!
library(XML)     # to build the RSS feed
 
who <- "hrbrmstr" # CHANGE ME!
 
# Grab the user's gist feed -----------------------------------------------
 
gist_feed <- sprintf("https://gist.github.com/%s.atom", who)
feed_pg <- read_xml(gist_feed)
ns <- xml_ns_rename(xml_ns(feed_pg), d1 = "feed")
 
# Extract the links & titles of the gists in the feed ---------------------
 
links <-  xml_attr(xml_find_all(feed_pg, "//feed:entry/feed:link", ns), "href")
titles <-  xml_text(xml_find_all(feed_pg, "//feed:entry/feed:title", ns))
 
#' This function does the hard part by iterating over the
#' links/titles and building a tbl_df of all the comments per-gist
get_comments <- function(links, titles) {
 
  bind_rows(pblapply(1:length(links), function(i) {
 
    # get gist
 
    pg <- read_html(links[i])
 
    # look for comments
 
    ref <- tryCatch(html_attr(html_nodes(pg, "div.timeline-comment-wrapper a[href^='#gistcomment']"), "href"),
                    error=function(e) character(0))
 
    # in theory if 'ref' exists then the rest will
 
    if (length(ref) != 0) {
 
      # if there were comments, get all the metadata we care about
 
      author <- html_text(html_nodes(pg, "div.timeline-comment-wrapper a.author"))
      timestamp <- html_attr(html_nodes(pg, "div.timeline-comment-wrapper time"), "datetime")
      contentpg <- str_trim(html_text(html_nodes(pg, "div.timeline-comment-wrapper div.comment-body")))
 
    } else {
      ref <- author <- timestamp <- contentpg <- character(0)
    }
 
    # bind_rows ignores length 0 tbl_df's
    if (sum(lengths(list(ref, author, timestamp, contentpg))==0)) {
      return(data_frame())
    }
 
    return(data_frame(title=titles[i], link=links[i],
                      ref=ref, author=author,
                      timestamp=timestamp, contentpg=contentpg))
 
  }))
 
}
 
comments <- get_comments(links, titles)
 
feed <- xmlTree("feed")
feed$addNode("id", sprintf("user:%s", who))
feed$addNode("title", sprintf("%s's gist comments", who))
feed$addNode("icon", "https://assets-cdn.github.com/favicon.ico")
feed$addNode("link", attrs=list(href=sprintf("https://github.com/%s", who)))
feed$addNode("updated", format(Sys.time(), "%Y-%m-%dT%H:%M:%SZ", tz="GMT"))
 
for (i in 1:nrow(comments)) {
 
  feed$addNode("entry", close=FALSE)
    feed$addNode("id", sprintf("gist:comment:%s:%s", who, comments[i, "timestamp"]))
    feed$addNode("link", attrs=list(href=sprintf("%s%s", comments[i, "link"], comments[i, "ref"])))
    feed$addNode("title", sprintf("Comment by %s", comments[i, "author"]))
    feed$addNode("updated", comments[i, "timestamp"])
    feed$addNode("author", close=FALSE)
      feed$addNode("name", comments[i, "author"])
    feed$closeTag()
    feed$addNode("content", saveXML(xmlTextNode(as.character(comments[i, "contentpg"])), prefix=""), 
                 attrs=list(type="html"))
  feed$closeTag()
 
}
 
rss <- str_replace(saveXML(feed), "", '')
 
writeLines(rss, con="feed.xml")

To leave a comment for the author, please follow the link and comment on their blog: rud.is » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)