ICERM Reproducibility Workshop: Day 1

December 10, 2012
By

I'm attending a workshop on reproducibility at ICERM (Brown University) this week. I really appreciate this great opportunity offered by ICERM, Randy and Victoria. It is pretty exciting to meet people that you only knew before through indirect ways. O...

Using (R) Markdown, Jekyll, & GitHub for a Website

December 10, 2012
By

Introduction Markdown has been growing in popularity for writing documents on the web. With the introduction of R Markdown (see also Jeromy Anglim’s post on getting started with R Markdown) and knitr, R Markdown has simplified the publishing of R analysis on the web. I recently converted my website from Wordpress to Jekyll. Jekyll...

A Simple Model for Realized Volatility

December 9, 2012
By

The post has two goals: (1) Explain how to forecast volatility using a simple Heterogeneous Auto-Regressive (HAR) model. (Corsi, 2002) (2) Check if higher moments like Skewness and Kurtosis add forecast value to this model. It will be a high … Continue reading →

Handling missing data with Amelia

December 9, 2012
By

So, what if you have data, but some of the observations are missing? Many statistical techniques assume no missingness, so we might want to “fill in” or rectangularize our data, by replacing missing observations with plausible substitutes....

December 8, 2012
By

A few months ago, I bought a really cool book: Exploring Everyday Things (with R and Ruby). I learned many interesting and mostly useless things from the author, Sau Sheong Chang. Chapter 6 for example explains how to build a … Weiterlesen →

Rcpp attributes: Even easier integration of GSL code into R

December 8, 2012
By

Following the Rcpp 0.10.0 release, I had written about simulating pi easily by using the wonderful new Rcpp Attributes feature. Now with Rcpp 0.10.1 released a good week ago, it is time to look at how Rcpp Attributes can help with external libraries. As this posts aims to show, it is a breeze! One key aspect is the use of...

December 8, 2012
By

I really wanted to put something together for this series on the twitteR package. Unfortunately, at the moment the number of interesting things than can be done with twitteR, as opposed to through API calls and RCurl, is limited. Regardless, I have Ye...

Bridge hand distribution: simulation vs exact calculation

December 8, 2012
By

Recently I played bridge with my friends. Being frustrated with several consecutive poor hand distributions we asked ourselves a question what is the probability of having a hand good enough for a small slam. A well known rule of thumb is that you need...

Fifty Shades of Grey in R

December 8, 2012
By

My wife went out to her book group tonight and their book of the month was 50 Shades of Grey. Sadly, I could think of is that plotting 50 shades in R would be a neat exercise. require(ggplot2) grey50 <- data.frame( x = rep(1:10, 5), y = rep(1:5, ...

XLLoop framework

December 7, 2012
By

Today I want to highlight the XLLoop framework : Excel User-Define Functions in in any language. The XLLoop consists of two main components: An Excel addin implementation (XLL written in c++). A server and framework written in R (or/and in many other languages). The XLLoop allows you to connect Excel and R in very simple

Mean Value from Grouped Data

December 7, 2012
By

Occasionally, I will get requests from clients to calculate the mean. Most of the time it’s a simple request but from time-to-time the data was originally from grouped data. A common approach is to take the midpoint of each of the groups and just assume that all respondents within that group average out to the

December 7, 2012
By

Conrad launched the 3.6 series of Armadillo earlier today with a first 3.6.0 release. So RcppArmadillo, our wrapper for R and Armadillo, is now on CRAN with its corresponding version 0.3.6.0. No R level or interface changes were needed, and the upstr...

Please stop using Excel-like formats to exchange data

December 7, 2012
By

I know “officially” data scientists all always work in “big data” environments with data in a remote database, streaming store or key-value system. But in day to day work Excel files and Excel export files get used a lot and cause a disproportionate amount of pain. I would like to make a plea to my Related posts:

d3 and r interacting through shiny

December 7, 2012
By

I was amazed and delighted by the Reconstruct Gene Networks Using Shiny.  Jeff accomplished what I knew was possible but had absolutely no idea how to implement.  With the boost, I went to work combining his d3 force layout with my d3 experim...

R and the SGeMS blockdata format

December 7, 2012
By

The popular geostatistical software SGeMS has some options for working with non-point support (block) data through the BGeost set of algorithms by Yongshe Liu (see his PhD thesis), and published in Liu and Journel (2009). A specific but ...

R analysis shows how UK health system could save £200m

December 7, 2012
By

According to an analysis by Prescribing Analytics (a joint venture of technologists and doctors in the UK), Britain's cash-strapped National Health Service (NHS) is overspending on prescription drugs. While cheaper (but equally effective) generic drugs are widely available for many treatments, some doctors continue to prescribe patented drugs which can cost 10 times as much — and often much...

Mapping Primary Care Trust (PCT) Data, Part 1

December 7, 2012
By

The launch or official opening or whatever it was of the Open Data Institute this week provided another chance to grab a snapshot of notable folk in the community, as for example demonstrated by people commonly followed by users of the #ODIlaunch hashtag on Twitter. The PR campaign also resulted in the appearance of some

UEFA Champions League Knockout Phase Draws: Monte Carlo Simulation with R

December 7, 2012
By

Draws for the knockout phase of the 2012–13 UEFA Champions League will be held in Nyon on the 20th December 2012. The rules of the draw are simple and are as follows:8 Group winner teams will be seeded.8 Group runner-up teams will be unseeded.Teams coming from the same group and from same association...

Dot-density maps with spsample()

December 7, 2012
By

Today’s example is a little odd, in that the code isn’t pretty and the example isn’t really something you’d actually produce in real life — but if you’ll overlook those oddities, you’ll find that the spsample(...

Visualizing Baltimore with R and ggplot2: Crime Data

December 7, 2012
By

The advent of municipal open data initiatives has been both a blessing and curse for my particular brand of data nerd. On one hand, it has opened up the possibility of developing deep and useful knowledge about the places we...

How to spend an inordinate amount of time becoming efficient

December 6, 2012
By

I’ve spent a good deal of 2012 constructing a data warehouse to manage all the various data elements that my company has. Although we’re a small enterprise, the richness and complexity of the information is rather high. Moreover, as a data-driven organization, there’s a strong impetus to construct meaningful analysis with every bit of input

R in the Cloud

December 6, 2012
By

I've been having some great fun parallelizing R code on Amazon's cloud. Now that things are chugging away nicely, it's time to document my foibles so I can remember not to fall into the same pits of despair again. The goal was to perform lots of trails of a randomized statistical simulation. The jobs were independent and fairly chunky, taking...

Importing Data Into R from Different Sources

December 6, 2012
By

I have found that I get data from many different sources.  These sources range from simple .csv files to more complex relational databases, to structure XML or JSON files.  I have compiled the different approaches that one can use to easily access these datasets. Local Column Delimited Files This is probably the most common and

Tibshirani’s original paper on the lasso. Breiman’s…

December 6, 2012
By
$\large \dpi{200} \bg_white \sqrt{\blacksquare^2 + \blacksquare^2 + \blacksquare^2 + \blacksquare^2 + \blacksquare^2 + \blacksquare^2 + \ldots }$

Tibshirani’s original paper on the lasso. Breiman’s Garotte — 1993 Tibshirani lasso paper submitted — 1994 Tibshirani lasso paper revised — 1995 Tibshirani lasso paper accepted — 1996 This is one of those papers that I’m so excited about, I feel like “You should just read the whole thing! It’s all good!” But I realise that’s less than reasonable. Here is a bit of summary,...

ggplot2 0.9.3 and plyr 1.8 have been released!

December 6, 2012
By

We’re pleased to announce new versions of ggplot2 (0.9.3) and plyr (1.8).  To get up and running with the new versions, start a clean R session without ggplot2 or plyr loaded, and run install.packages(c("ggplot2", "gtable", "scales", "plyr")). Read on to find out what’s new. ggplot2 0.9.3 Most of the changes version 0.9.3 are bug fixes. Perhaps

Link to Item Response Theory Presentations Using R

December 6, 2012
By

After my post on item response theory,  a number of you have asked for links to applications that provide R code.  As I noted in that post, a good deal of work is being done in an area of research called patient-related outcome measurement (P...

To reject random walk in climate

December 6, 2012
By

I read the post The surprisingly weak case for global warming and the rejection; Climate: Misspecified. Based on the first, I wanted to make a post, just to write I agree with the second.The post features a number of plots like thisFor m...

Learn R by trying R

December 6, 2012
By

By Revolution Analytics training manager James Peruvankal If you are new to R, and want to get an introduction to the R language, in the classic “learning by doing way”, Code school and O’Reilly have put together the Try R interactive tutorial. This tutorial is a painless introduction to the R programming language. During the course you'll become familiar...