Blog Archives

Splitting a Large CSV File into Separate Smaller Files Based on Values Within a Specific Column

April 3, 2013
By
Splitting a Large CSV File into Separate Smaller Files Based on Values Within a Specific Column

One of the problems with working with data files containing tens of thousands (or more) rows is that they can become unwieldy, if not impossible, to use with “everyday” desktop tools. When I was Revisiting MPs’ Expenses, the expenses data I downloaded from IPSA (the Independent Parliamentary Standards Authority) came in one large CSV file

Read more »

Revisiting MPs’ Expenses

April 2, 2013
By
Revisiting MPs’ Expenses

I couldn’t but notice the chatter about Iain Duncan Smith claiming he’d have no problem “living on 53 pounds a dayweek“, which made me wonder not only how many meal catered events he attends each week (and how many of his scheduled meeting also have complementary tea and biscuits (a bellweather of the extent of

Read more »

Publishing Stats for Analytic Reuse – FAOStat Website and R Package

March 8, 2013
By
Publishing Stats for Analytic Reuse – FAOStat Website and R Package

How can stats and data publishers, from NGOs and (inter)national statistics agencies to scientific researchers, publish their data in a way that supports its analysis directly, as well as in combination with other datasets? Here’s one approach I learned about from Michael Kao of the UN Food and Agriculture Organisation statistics division, FAOStat. At first

Read more »

What Happened Then? Using Approximated Twitter Follower Accession to Identify Political Events

March 4, 2013
By
What Happened Then? Using Approximated Twitter Follower Accession to Identify Political Events

Following a chat with @andypryke, I thought I’d try out a simple bit of feature detection around approximated follower acquisition charts (e.g. Estimated Follower Accession Charts for Twitter) to see if I could detect dates around which there were spikes in follower acquisition. So for example, here’s the follower acquistion chart for Seem Malhotra: We

Read more »

Sketches Around Twitter Followers

February 19, 2013
By
Sketches Around Twitter Followers

I’ve been doodling… Following a query about the possible purchase of Twitter followers for various public figure accounts (I need to get my head round what the problem is with that exactly?!), I thought I’d have a quick look at some stats around follower groupings… I started off with a data grab, pulling down the

Read more »

Reshaping Horse Import/Export Data to Fit a Sankey Diagram

February 18, 2013
By
Reshaping Horse Import/Export Data to Fit a Sankey Diagram

As the food labeling and substituted horsemeat saga rolls on, I’ve been surprised at how little use has been made of “data” to put the structure of the food chain into some sort of context* (or maybe I’ve just missed those stories?). One place that can almost always be guaranteed to post a few related

Read more »

F1Stats – Correlations Between Qualifying, Grid and Race Classification

February 9, 2013
By
F1Stats – Correlations Between Qualifying, Grid and Race Classification

Following directly on from F1Stats – Visually Comparing Qualifying and Grid Positions with Race Classification, and continuing in my attempt to replicate some of the methodology and results used in A Tale of Two Motorsports: A Graphical-Statistical Analysis of How Practice, Qualifying, and Past SuccessRelate to Finish Position in NASCAR and Formula One Racing, here’s

Read more »

Using SPARQL Query Libraries to Generate Simple Linked Data API Wrappers

January 31, 2013
By
Using SPARQL Query Libraries to Generate Simple Linked Data API Wrappers

A handful of open Linked Data have appeared through my feeds in the last couple of days, including (via RBloggers) SPARQL with R in less than 5 minutes, which shows how to query US data.gov Linked Data and then Leigh Dodds’ Brief Review of the Land Registry Linked Data. I was going to post a

Read more »

Using SPARQL Query Libraries to Generate Simple Linked Data API Wrappers

January 31, 2013
By
Using SPARQL Query Libraries to Generate Simple Linked Data API Wrappers

A handful of open Linked Data have appeared through my feeds in the last couple of days, including (via RBloggers) SPARQL with R in less than 5 minutes, which shows how to query US data.gov Linked Data and then Leigh Dodds’ Brief Review of the Land Registry Linked Data. I was going to post a

Read more »

F1Stats – Visually Comparing Qualifying and Grid Positions with Race Classification

January 30, 2013
By
F1Stats – Visually Comparing Qualifying and Grid Positions with Race Classification

Following the roundabout tour of F1Stats – A Prequel to Getting Started With Rank Correlations, here’s a walk through of my attempt to replicate the first part of A Tale of Two

Read more »