feedeR: Reading RSS and Atom Feeds from R

August 8, 2016
By

(This article was first published on R – Exegetic Analytics, and kindly contributed to R-bloggers)

I’m working on a project in which I need to systematically parse a number of RSS and Atom feeds from within R. I was somewhat surprised to find that no package currently exists on CRAN to handle this task. So this presented the opportunity for a bit of DIY.

You can find the fruits of my morning’s labour here.

Installing and Loading

The package is currently hosted on GitHub.

> devtools::install_github("DataWookie/feedeR")
> library(feedeR)

Reading a RSS Feed

Although Atom is supposed to be a better format from a technical perspective, RSS is relatively ubiquitous. The vast majority of blogs provide an RSS feed. We’ll look at the feed exposed by R-bloggers.

> rbloggers <- feed.extract("https://feeds.feedburner.com/RBloggers")
> names(rbloggers)
[1] "title"   "link"    "updated" "items"

There are three metadata elements pertaining to the feed.

> rbloggers[1:3]
$title
[1] "R-bloggers"

$link
[1] "https://www.r-bloggers.com"

$updated
[1] "2016-08-06 09:15:54 UTC"

The actual entries on the feed are captured in the items element. For each entry the title, publication date and link are captured. There are often more fields available for each entry, but these three are generally present.

> nrow(rbloggers$items)
[1] 8
> head(rbloggers$items, 3)
                                                              title                date
1                                                       readr 1.0.0 2016-08-05 20:25:05
2 Map the Life Expectancy in United States with data from Wikipedia 2016-08-05 19:48:53
3 Creating Annotated Data Frames from GEO with the GEOquery package 2016-08-05 19:35:45
                                                                                           link
1                                                       https://www.r-bloggers.com/readr-1-0-0/
2 https://www.r-bloggers.com/map-the-life-expectancy-in-united-states-with-data-from-wikipedia/
3 https://www.r-bloggers.com/creating-annotated-data-frames-from-geo-with-the-geoquery-package/

Reading an Atom Feed

Atom feeds are definitely in the minority, but this format is still used by a number of popular sites. We’ll look at the feed from The R Journal.

> rjournal <- feed.extract("http://journal.r-project.org/rss.atom")

The same three elements of metadata are present.

> rjournal[1:3]
$title
[1] "The R Journal"

$link
[1] "http://journal.r-project.org"

$updated
[1] "2016-07-23 13:16:08 UTC"

Atom feeds do not appear to consistently provide the date on which each of the entries was originally published. The title and link fields are always present though!

> head(rjournal$items, 3)
                                                                                title date
1                         Heteroscedastic Censored and Truncated Regression with crch   NA
2 An Interactive Survey Application for Validating Social Network Analysis Techniques   NA
3            quickpsy: An R Package to Fit Psychometric Functions for Multiple Groups   NA
                                                                     link
1  http://journal.r-project.org/archive/accepted/messner-mayr-zeileis.pdf
2        http://journal.r-project.org/archive/accepted/joblin-mauerer.pdf
3 http://journal.r-project.org/archive/accepted/linares-lopez-moliner.pdf

Outlook

I’m still testing this across a selection of feeds. If you find a feed that breaks the package, please let me known and I’ll debug as necessary.

The post feedeR: Reading RSS and Atom Feeds from R appeared first on Exegetic Analytics.

To leave a comment for the author, please follow the link and comment on their blog: R – Exegetic Analytics.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)