## Estimating Missing Data with aregImpute() {R}

April 19, 2010
Missing Data Soil scientists routinely sample, characterize, and summarize patterns in soil properties in space, with depth, and through time. Invariably, some samples will be lost or sufficient funds required for complete characterization can run out. In these cases the scientist is left with a data table that contains holes (so to speak) in the rows/columns that are...

## R tip: Maximum screen width

April 19, 2010
R can be annoying in that even if you stretch your terminal or R GUI session to a whole screen width it will still only show 80 characters width. This can make wide tables really hard to read. options(width=150)Use the options command width to set...

## Example 7.33: Specifying fonts in graphics

April 19, 2010
For interactive data analysis, the default fonts used by SAS and R are acceptable, if not beautiful. However, for publication, it may be important to manipulate the fonts. For example, it would be desirable for the fonts in legends, axis labels, or o...

## Getting your web application and R(Apache) to talk to each other

April 19, 2010
Here’s the situation. Web applications, built using a framework (e.g. Rails, Django) are great for fetching data from a database and rendering it. They’re not so great for crunching and charting the data. Conversely, R is great for crunching and charting, but doesn’t make for a great web application. The idea then, is to let

## Thoughts on LSPM from R/Finance 2010

April 18, 2010
I just got back from R/Finance 2010 in Chicago. If you couldn't make it this year, I strongly encourage you to attend next year. I will post a more comprehensive review of the event in the next couple days, but I wanted to share some of my notes spec...

## Sudokus more random than random!

April 18, 2010
$Sudokus more random than random!$

Darren Wraith pointed out this column about sudokus to me. It analyses the paper by Newton and De Salvo published in the Proceedings of the Royal Academy of Sciences A that I cannot access from home. The discussion contains this absurd sentence “Sudoku matrices are actually more random than randomly-generated matrices” which shows how mistreated

## Summarising data using scatter plots

April 18, 2010
A scatter plot is a graph used to investigate the relationship between two variables in a data set. The x and y axes are used for the values of the two variables and a symbol on the graph represents the combination for each pair of values in the data set. This type of graph is

## alphahull: an R Package for Alpha-Convex Hull

April 16, 2010
new paper on the α-convex hull appeared in the Journal of Statistical Software today (http://www.jstatsoft.org/v34/i05/paper). The α-convex hull is an interesting problem which caught my attention long time ago but I didn’t know a solution then. R has a function chull() which can generate (indices of) the convex hull for a series of points. Now

## Significant Figures in R and Rounding

April 16, 2010
This is a follow-on to my previous post about determining significant digits, or sigdigs, in performance and capacity planning calculations. Once we know how to do that, inevitably we will be faced with rounding the result of a calculation to the least...

## Simulating Dart Throws in R

April 16, 2010
Back in November 2009 Wired wrote an article about some grad students who decided to try to stochastically model throwing darts. Because I don’t actually read printed material I didn’t see the article until a couple of months ago. My immediate thought was, “hey, I drink beer. I throw darts. I build stochastic models. Why

## R Command Line

April 16, 2010
I am an R user! And I see a whole army of R users, here in defiance of tyranny. You’ve come to use R as free men… and free men you are. What will you do with that freedom? Will you use R? Use R and you may use the command line. Use SAS,

## Because it’s Friday: When infographics go bad

April 16, 2010
Phil Gyford laments crappy infographics swamping the good: Phil Gyford on Flickr: Infographic (used under Creative Commons license)

## R – not the epic fail we thought

April 16, 2010
I usually like AnnMaria's witty insight. I can relate to a lot of what she is saying. After all SAS and family life are large parts of my life, too. But you can imagine the reaction she provoked in saying the following:I know that R is free and I am ac...

## Rcpp 0.7.12

April 16, 2010
A new bug fix versions 0.7.12 of Rcpp is awaiting inclusion into CRAN and Debian. It is also available from here. This is another bug-fix version related solely to a build failure on Windows. Trying to protect paths with spaces has the side-effect of ...

## An article attacking R gets responses from the R blogosphere – some reflections

April 16, 2010
In this post I reflect on the current state of the R blogosphere, and share my hopes for the future

## The Next Big Thing: SAS and SPSS!…wait, what?

April 15, 2010
Thanks to the R Bloggers aggregator I came across Yihui Xie’s post on a piece currently making the rounds about statistical analysis platforms. In The Next Big Thing, AnnMaria De Mars makes the argument that R—as a statistical computing platform—is not well suited for what she views as the next big things in data

## Solving optimization problems numerically in R with optim()

April 15, 2010
Often in game theory (and presumably other applied math settings) we are interested in the behavior of equations with no explicit solution. In this talk, Andrew Little demonstrates how I use the optim() and other functions in R for such situations in m...

## Saving the world with R

April 15, 2010
Tuesday's meeting of the Bay Area R UseR Group at the LinkedIn offices was a great event. The headline speaker was Joe Adler, author of the excellent R reference manual, R in a Nutshell. Joe's presentation was an in-depth look at the relative speed of various options in R for looking up values from a key in a key-value...

## R is an Epic Fail?

April 15, 2010
I came across this blog post just now: The Next Big Thing, and of course these words caught my attention: However, for me personally and for most users, both individual and organizational, the much greater cost of software is the time it takes to install it, maintain it, learn it and document it. On

## I’d be more than happy with the unlinked data web

April 14, 2010
Visit this URL and you’ll find a perfectly-formatted CSV file containing information about recent earthquakes. A nice feature of R is the ability to slurp such a URL straight into a data frame: quakes <- read.csv("http://neic.usgs.gov/neis/gis/qed.asc", header = T) colnames(quakes) # "Date" "TimeUTC" "Latitude" "Longitude" "Magnitude" "Depth" # number of recent quakes nrow(quakes) #

## Zelig and Matching in R with an Application to Conflict and Leader Tenure

April 14, 2010
Andrew Little discusses two econometric packages developed by Gary King of Harvard, and how he has used them in his research at the August, 2009 NYC R Statistical Programming Meetup. Zelig - a single, easy-to-use package that can estimate, help inter...

## Lots of new Videos in Rchive

April 14, 2010
I have just uploaded a bunch of new videos the Rchive (yea, that’s what I am calling it now). Most of the videos are from the April NYC meetup, which include the following talks: Pankaj Chopra—using R and Bioconductor (http://www.bioconductor.org/) for biomarker detection in cancer Andrew Ilardi—an R project that analyzes a list of stocks while reaching out

## Portfolio Correlation Analysis Tool

April 14, 2010
Andrew Ilardi presents R project that analyzes a list of stocks to the NYC R Statistical Programming Meetup on April 8, 2010. Andrew's tool reaches out to the web to pull historical stock prices, then plots Quarter over Quarter correlations and a yea...

## Biomarker detection in cancer (gene expression analysis)

April 14, 2010
Pankaj Chopra discusses using R and Bioconductor (http://www.bioconductor.org/) for biomarker detection in cancer to the NYC R Statistical Programming Meetup on April 8, 2010.

## New York Pizza – How to Find the Best

April 14, 2010
Jared Lander discusses the science---and statistics---of finding the best pizza in NYC to the NYC R Statistical Programming Meetup on April 8, 2010.

## Object Oriented Programming with R: My notebook

April 14, 2010
In the following post, I describe how I've used the OOP features of R to create and use the following class hierarchy: