Monthly Archives: December 2012

R and the SGeMS blockdata format

December 7, 2012
By

The popular geostatistical software SGeMS has some options for working with non-point support (block) data through the BGeost set of algorithms by Yongshe Liu (see his PhD thesis), and published in Liu and Journel (2009). A specific but ...

R analysis shows how UK health system could save £200m

December 7, 2012
By

According to an analysis by Prescribing Analytics (a joint venture of technologists and doctors in the UK), Britain's cash-strapped National Health Service (NHS) is overspending on prescription drugs. While cheaper (but equally effective) generic drugs are widely available for many treatments, some doctors continue to prescribe patented drugs which can cost 10 times as much — and often much...

Mapping Primary Care Trust (PCT) Data, Part 1

December 7, 2012
By

The launch or official opening or whatever it was of the Open Data Institute this week provided another chance to grab a snapshot of notable folk in the community, as for example demonstrated by people commonly followed by users of the #ODIlaunch hashtag on Twitter. The PR campaign also resulted in the appearance of some

UEFA Champions League Knockout Phase Draws: Monte Carlo Simulation with R

December 7, 2012
By

Draws for the knockout phase of the 2012–13 UEFA Champions League will be held in Nyon on the 20th December 2012. The rules of the draw are simple and are as follows:8 Group winner teams will be seeded.8 Group runner-up teams will be unseeded.Teams coming from the same group and from same association...

Dot-density maps with spsample()

December 7, 2012
By

Today’s example is a little odd, in that the code isn’t pretty and the example isn’t really something you’d actually produce in real life — but if you’ll overlook those oddities, you’ll find that the spsample(...

Visualizing Baltimore with R and ggplot2: Crime Data

December 7, 2012
By

The advent of municipal open data initiatives has been both a blessing and curse for my particular brand of data nerd. On one hand, it has opened up the possibility of developing deep and useful knowledge about the places we...

How to spend an inordinate amount of time becoming efficient

December 6, 2012
By

I’ve spent a good deal of 2012 constructing a data warehouse to manage all the various data elements that my company has. Although we’re a small enterprise, the richness and complexity of the information is rather high. Moreover, as a data-driven organization, there’s a strong impetus to construct meaningful analysis with every bit of input

R in the Cloud

December 6, 2012
By

I've been having some great fun parallelizing R code on Amazon's cloud. Now that things are chugging away nicely, it's time to document my foibles so I can remember not to fall into the same pits of despair again. The goal was to perform lots of trails of a randomized statistical simulation. The jobs were independent and fairly chunky, taking...

Importing Data Into R from Different Sources

December 6, 2012
By

I have found that I get data from many different sources.  These sources range from simple .csv files to more complex relational databases, to structure XML or JSON files.  I have compiled the different approaches that one can use to easily access these datasets. Local Column Delimited Files This is probably the most common and

Tibshirani’s original paper on the lasso. Breiman’s…

December 6, 2012
By
$\large \dpi{200} \bg_white \sqrt{\blacksquare^2 + \blacksquare^2 + \blacksquare^2 + \blacksquare^2 + \blacksquare^2 + \blacksquare^2 + \ldots }$

Tibshirani’s original paper on the lasso. Breiman’s Garotte — 1993 Tibshirani lasso paper submitted — 1994 Tibshirani lasso paper revised — 1995 Tibshirani lasso paper accepted — 1996 This is one of those papers that I’m so excited about, I feel like “You should just read the whole thing! It’s all good!” But I realise that’s less than reasonable. Here is a bit of summary,...