R on the cloud

July 9, 2011
By

Just as scientists should never really have to think much about statistics, I feel that, in an ideal world, statisticians would never have to worry about computing. In the real world, though, we have to spend a lot of time building our own tools.It would be great if we could routinely run R with speed and memory limitations...

Read more »

Notes on Engineering Data Analysis (with R and ggplot2)

July 8, 2011
By
Notes on Engineering Data Analysis (with R and ggplot2)

Hadley Wickham gave a Google Tech Talk a couple weeks back titled Engineering Data Analysis (with R and ggplot2). These are my notes.The data analysis cycle is to iteratively transform, visualize and model. Leading into the cycle is data access an...

Read more »

Blog in motion

July 8, 2011
By

In the next few days we’ll be changing the format of the blog and moving it to a new server. If you have difficulty posting comments, just wait and post them in a few days when all should be working well. (But if you can post a comment, go for it. All the old entries

Read more »

The Road to Default: Let’s Look at the Damage with a Rant.

July 8, 2011
By
The Road to Default: Let’s Look at the Damage with a Rant.

The following graph shows Real GDP as a percentage of the Gross Federal Debt.  FRED is the resource I frequently use for United States financial data and it serves us well here.  What is Gross Federal Debt? Well, its total government debt out...

Read more »

NYT on the importance of reproducible research

July 8, 2011
By
NYT on the importance of reproducible research

Yesterday's New York Times includes a great article on the failure of some genetic tests for cancer detection, and the flaws in the research that led to them. The article features quotes from Keith Baggerly of MD Anderson Cancer Center, and includes a photo of him and colleague Kevin Coombes in front of a page of R code: The...

Read more »

The Rebirth

July 8, 2011
By
The Rebirth

Hey guys, I know that I have been gone for awhile now.  I just came back from a month long euro adventure so have no fear, I have plenty of time to devote to blogging now.  From this day forward, X.U. Economics will be known as The Dancing Ec...

Read more »

Censoring on one end, “outliers” on the other, what can we do with the middle?

July 8, 2011
By

This post was written by Phil. A medical company is testing a cancer drug. They get a 16 genetically identical (or nearly identical) rats that all have the same kind of tumor, give 8 of them the drug and leave 8 untreated…or maybe they give them a placebo, I don’t know; is there a placebo

Read more »

installing R 2.13.1 on Amazon EC2′s “Amazon Linux” AMI #rstats

July 8, 2011
By
installing R 2.13.1 on Amazon EC2′s “Amazon Linux” AMI #rstats

Condensed from this post (and comments) on David Chudzicki’s blog, tweaked, and updated for R-2.13.1. Assumes you’re starting with a virgin “Amazon Linux” AMI. I picked “Basic 64-bit Amazon Linux AMI 2011.02.1 Beta” (AMI Id: ami-8e1fece7) because it was marked as free tier eligible on the “Quick Start” tab of AWS’s “Launch Instance” dialog box:

Read more »

The virtues of incoherence?

July 8, 2011
By

Kent Osband writes:

Read more »

R 2.13.1 released

July 8, 2011
By

As announced by the R core team, R 2.13.1 has been released on schedule today. For those who build R themselves, updated source code is available now; pre-built binaries will propagate through the CRAN network over the next couple of days. This update includes some minor bug fixes to graphics, fixes to some edge cases in the class systems,...

Read more »

The R Programming Wikibook

July 8, 2011
By
The R Programming Wikibook

The R Programming wikibook is an open source community project that "aims to create a cross-disciplinary practical guide to the R programming language." It was launched in June 2011 and is seeking content and contributors. The full call for the R Progr...

Read more »

Analyzing the Failed States Index (with Polity IV)

July 7, 2011
By
Analyzing the Failed States Index (with Polity IV)

So, I decided to sit down and have a little fun with that Failed States Index data I put together. To start, I expect that the dataset will be pretty linearly correlated with the polity IV data. This makes sense–true democracies aren’t failed states, and failed states tend not to be democratic. To test this,

Read more »

Scary Derivatives and Scary XML in R

July 7, 2011
By
Scary Derivatives and Scary XML in R

I need some new R skills, and there is no better motivation to learn XML in R than one of the scariest financial datasets out there—the US Department of the Treasury Office of the Comptroller of the Currency (OCC) Quarterly Derivatives Report. I’ll...

Read more »

High-quality R graphics on the Web with SVG

July 7, 2011
By
High-quality R graphics on the Web with SVG

If you want the graphics you create with R to look their best, in general it's best to go for a vector-based graphics format instead of a raster-based format. Common formats like GIF and JPG are raster-based: the image is composed of pixels, and if you don't choose a high enough resolution, you're likely to lose fine details and/or...

Read more »

GLMM Hell

July 7, 2011
By
GLMM Hell

I have been starting to analyze some data I have of repeated counts of salamanders from 5 plots over 4 years. I am trying to develop a predictive model of salamander nighttime surface activity as a function of weather variables. The repeated counting l...

Read more »

A simple ggplot2 scatterplot revisited

July 7, 2011
By

Rick Wicklin contacted me with a helpful suggestion for improving the data presentation method outlined in my  previous post on using ggplot2 to visualize some data. In the previous post I had plotted up a highly correlated set of points, showing the correspondence between maximum daily body temperatures of model snails sitting with the foot

Read more »

Twitter Math Puzzle and Solution

July 7, 2011
By

Yesterday I posted a very simple math puzzle to Twitter that I found in Jonathan Baron’s book, Thinking and Deciding. The puzzle is the following: Show that every number of the form ABC,ABC is divisible by 13. The puzzle comes up in Baron’s book as an example of an “insight problem” in which one goes

Read more »

Twitter Math Puzzle and Solution

July 7, 2011
By

Yesterday I posted a very simple math puzzle to Twitter that I found in Jonathan Baron’s book, Thinking and Deciding. The puzzle is the following: Show that every number of the form ABC,ABC is divisible by 13. The puzzle comes up in Baron’s book as an example of an “insight problem” in which one goes

Read more »

R: calculations involving months

July 7, 2011
By
R: calculations involving months

Ask anyone how much time has elapsed since September last year and they’ll probably start counting on their fingers: “October, November…” and tell you “just over 9 months.” So, when faced as I was today with a data frame (named dates) like this: How to add a 7th column, with the number of months between

Read more »

Things I would tell a budding bioinformatician to learn.

July 7, 2011
By

I recently read Ewan Birney's blog post, which I found echoed a lot of my own thoughts about the use of statistical in computational biology. I thought I would compile my own similar list but for bioinformatics  / computational biology in general. I have not been and in the field as long as Ewan and I certainly still...

Read more »

Descriptive statistics, causal inference, and story time

July 7, 2011
By

Dave Backus points me to this review by anthropologist Mike McGovern of two books by economist Paul Collier on the politics of economic development in Africa. My first reaction was that this was interesting but non-statistical so I’d have to either post it on the sister blog or wait until the 30 days of statistics

Read more »

Necessity to Explain CDS with A Regime Switching Model

Necessity to Explain CDS with A Regime Switching Model

Examining the determinants of credit default swap (CDS) spreads is a hot topic, CDS spread has displayed siginificant regime switching behaviour since the break of credit crisis, which can be seen from the old graph in the post Credit Default Spread a...

Read more »

Call for a Special Topic on Grid and Cloud Computing Methods in Biomedical Research

Today, the AG Statistical Computing released the “Call for a Special Topic on Grid and Cloud Computing” in the Journal “Methods of Information in Medicine”. We are inviting submissions for a special topic of Methods of Information in Medicine on “Grid and Cloud Computing Methods in Biomedical Research“. This special topic call originates from a

Read more »

Use R!

July 7, 2011
By
Use R!

In short: R is a free intuitive programming language that is used by practitioners in a plethora of academic disciplines. Therefore, it is on the cutting edge, and expanding rapidly. It creates stunning visuals, works seamlessly together with LaTeX, has really good online documentation and the community is unparalleled. A week...

Read more »

Rcpp 0.9.5

A maintenance release version 0.9.5 of Rcpp is now on CRAN and in Debian.This release comprises a number of minor fixes, extensions as well as small additions to the documentation and examples which have accumulated since the last release in Apr...

Read more »

Men with Hats

July 6, 2011
By
Men with Hats

Suppose N people (and their hats) attend a party (in the 1950s). For fun, the guests mix their hats in a pile at the center of the room, and each person picks a hat uniformly at random. What is the probability that nobody ends up with their own hat?E...

Read more »

rasterVis

rasterVis

The raster package defines classes and methods for spatial raster data access and manipulation. The new rasterVis package complements raster providing a set of methods for enhanced visualization and interaction. It is now at CRAN. Several examples can ...

Read more »

How Marketo uses Revolution R Enterprise

July 6, 2011
By

Marketo, a leading marketing automation company, relies on data analysis to implement the features in its hosted application that help companies get the most out of their marketing dollar. We've just published a case study about how Marketo uses Revolution R Enterprise and the R language to analyze the massive data sets generated by their customers: “I use it...

Read more »

Importing google news data to R

July 6, 2011
By
Importing google news data to R

I've been playing around lately with the stock market data available from google finance, through quantmod in R. Here's a function I've written (which depends on the R Data Science Toolkit), to pull news stories related to a stock from google, parse t...

Read more »