Blog Archives

Better handling of JSON data in R?

March 13, 2014
By

What is the best way to read data in JSON format into R? Though really common for almost all modern online applications, JSON is not every R user's best friend. After seeing the slides for my Web Scraping course, in which I somewhat arbitrarily veered between using the packages rjson and RJSONIO, the creator of a...

Read more »

Web Scraping: working with APIs

March 12, 2014
By

APIs present researchers with a diverse set of data sources through a standardised access mechanism: send a pasted together HTTP request, receive JSON or XML in return. Today we tap into a range of APIs to get comfortable sending queries and processing...

Read more »

Web Scraping: Scaling up Digital Data Collection

March 5, 2014
By

The latest slides from web scraping through R: Web scraping for the humanities and social sciencesSlides from the first session hereSlides from the second session hereThis week we look in greater detail at scaling up digital data-collection: coercing s...

Read more »

Web Scraping part2: Digging deeper

February 25, 2014
By

Slides from the second web scraping through R session: Web scraping for the humanities and social sciencesIn which we make sure we are comfortable with functions, before looking at XPath queries to download data from newspaper articles. Examples includ...

Read more »

Web-Scraping: the Basics

February 19, 2014
By

Slides from the first session of my course about web scraping through R: Web scraping for the humanities and social sciencesIncludes an introduction to the paste function, working with URLs, functions and loops. Putting it all together we fetch data in...

Read more »

Plugging hierarchical data from R into d3

November 20, 2013
By
Plugging hierarchical data from R into d3

Here I show how to convert tabulated data into a json format that can be used in d3 graphics. The motivation for this was an attempt at getting an overview of topic models (link). Illustrations like the one to the right are very attractive; my motivati...

Read more »

Visualising Structure in Topic Models

November 11, 2013
By
Visualising Structure in Topic Models

How exactly should we visualise topic models to get an overview of how topics relate to each other? This post is a brief lit review of that debate - I realise the subject matter is sooo last year. I also present my chosen solution to the dilemma: I use dendrograms to position topic, and add a...

Read more »

Databases for text analysis: archive and access texts using SQL

November 7, 2013
By

This post is a collection of scripts I've found useful for integrating a SQL database into more complex applications. SQL allows quickish access to largish repositories of text (I wrote about this at some length here), and are a good starting point for...

Read more »

Scaling up text processing and Shutting up R: Topic modelling and MALLET

October 29, 2013
By
Scaling up text processing and Shutting up R: Topic modelling and MALLET

In this post I show how a combination of MALLET, Python, and data.table means we can analyse quite Big data in R, even though R itself buckles when confronted by textual data.  Topic modelling is great fun. Using topic modelling I have been able to separate articles about the 'Kremlin' as a) a building, b) an international actor c) the...

Read more »

Fun simulating Wimbledon in R and Python

July 4, 2013
By
Fun simulating Wimbledon in R and Python

R and Python have different strengths. There's little you can do in R you absolutely can't do in Python and vice versa, but there's a lot of stuff that's really annoying in one and nice and simple in the other. I'm sure simulations can be run in R, but it seems frightfully tricky. Recently I wrote a simple Tennis simulator...

Read more »