Monthly Archives: October 2012

My Goodness. What a Fat Dataset!

October 25, 2012
By
My Goodness.  What a Fat Dataset!

Recently at work we got sent a data file containing information on donations to a specific charitable organization, ranging all the way back to the 80′s.  Usually, when we receive a dataset with a donation history in it, each row … Continue reading

Read more »

Allstate compares SAS, Hadoop and R for Big-Data Insurance Models

October 25, 2012
By
Allstate compares SAS, Hadoop and R for Big-Data Insurance Models

At the Strata conference in New York today, Steve Yun (Principal Predictive Modeler at Allstate's Research and Planning Center) described the various ways he tackled the problem of fitting a generalized linear model to 150M records of insurance data. He evaluated several approaches: Proc GENMOD in SAS Installing a Hadoop cluster Using open-source R (both on the full data...

Read more »

Notes on a Scandal – When Jimmy beat Katy

October 25, 2012
By
Notes on a Scandal  – When Jimmy beat Katy

No the title doesn’t refer to how Katy Perry suffered at another of Jimmy Savile’s sexual predelictions, although these are two of  the participants. I’ll get to the details later Just over a year ago, I reflected on the relative wiki searches of leading female singing celebrities, including Ms Perry. In the light of the

Read more »

Palettes in R

October 25, 2012
By
Palettes in R

In its simplest form, a palette in R is simply a vector of colors. This vector can be include the hex triplet or R color names.The default palette can be seen through palette(): > palette("default") # you'll only need this line if you've previ...

Read more »

NSCB Sexy Statistics (Unemployment)

October 25, 2012
By
NSCB Sexy Statistics (Unemployment)

Recently, my friend posted on her Facebook account about the article published by the National Statistical Coordination Board (NSCB) about poverty and unemployment in the country.  Looking at the report I saw a lot of tables, so I thought why not ...

Read more »

Building a JSON webservice in R

October 25, 2012
By
Building a JSON webservice in R

R is a programming language for mathematics and statistics. There are several R libraries available to support web development, including rjson and RJSONIO (note case – R library names are case sensitive). RJSONIO is based on rjson, but with modifications to improve performance working with large JSON payloads. The example below returns the data required

Read more »

How fat are your tails?

October 25, 2012
By
How fat are your tails?

Lately I’ve been thinking about how to measure the fatness of the tails of a distribution. After some searching, I came across the Pareto Tail Index method. This seems to be used mostly in economics. It works by finding the decay rate of the tail. It’s complicated, both in formula and in it’s R implementation

Read more »

Congressional ideology by state

October 25, 2012
By
Congressional ideology by state

In a recent post, I illustrated how to add a background geom to your ggplot. While that code worked, and the plot looked fine, it was pointed out to me that I was missing an important aspect of plot layering with ggplot2. Namely, it is not, as I previ...

Read more »

R function: generate a panel data.table or data.frame to fill with data

October 25, 2012
By

I have started to work with R and STATA together. I like running regressions in STATA, but I do graphs and setting up the dataset in R. R clearly has a strong comparative advantage here compared to STATA. I was writing a function that will give me a (balanced) panel-structure in R. It then simply

Read more »

Rcpp modules more flexible

October 25, 2012
By


Rcpp modules just got more flexible (as of revision 3838 of Rcpp, to become 0.9.16 in the future).

modules have allowed exposing C++ classes for some time now, but developpers had to declare custom wrap and as specializations if they wanted their classes to be used as return type or argument type of a C++ function or method....

Read more »