How to make beautiful bubble charts with R

November 23, 2010
By
How to make beautiful bubble charts with R

Nathan Yau has just published at FlowingData a step-by-step guide on making bubble charts in R. It's actually pretty simple: read in data, sqrt-transform the "bubble" variable (to scale the bubbles by area, not radius), and use the symbols function to plot. It's the last step, though, that really ups the presentation quality: read R's PDF file into Illustrator...

Read more »

R and AOL in NYC

November 23, 2010
By

R and the NYC R User Group get brief mentions in this article about AOL's offices in New York City. The NYC UseRs meet at AOL and (ironically) the next meeting on Dec 9 is on the topic of R at Google. New York Observer: Bringing Some Sizzle to the Dial-Up King (via)

Read more »

R Style Guide

November 23, 2010
By
R Style Guide

Each year I have the pleasure (actually it’s quite fun) of teaching R programming to first year mathematics and statistics students. The vast majority of these students have no experience of programming, yet think they are good with computers because they use facebook! The class has around 100 students, and there are eight practicals. In

Read more »

Programming with R – Processing Football League Data Part I

November 23, 2010
By

In this post we will make use of football results data from the football-data.co.uk website to demonstrate creating functions in R to automate a series of standard operations that would be required for results data from various leagues and divisions. The first step is to consider what control options should be available as part of the

Read more »

Robust adaptive Metropolis algorithm [arXiv:10114381]

November 23, 2010
By
Robust adaptive Metropolis algorithm [arXiv:10114381]

Matti Vihola has posted a new paper on arXiv about adaptive (random walk) Metropolis-Hastings algorithms. The update in the (lower diagonal) scale matrix is where is the current acceptance probability and the target acceptance rate; is the current random noise for the proposal, ; is a step size sequence decaying to zero. The spirit of

Read more »

Learn Logistic Regression (and beyond)

November 23, 2010
By
Learn Logistic Regression (and beyond)

One of the current best tools in the machine learning toolbox is the 1930s statistical technique called logistic regression. We explain how to add professional quality logistic regression to your analytic repertoire and describe a bit beyond that. A statistical analyst working on data tends to deliberately start simple move cautiously to more complicated methods. Related posts:

Read more »

makefiles for Sweave, R and LaTeX using Eclipse on Windows

November 22, 2010
By

This post provides a brief introduction to make and makefiles. In particular it describes how to set up make on Windows with an emphasis on using make in Eclipse on projects involving R, Sweave, and LaTeX. Overview make is software that uses makefile...

Read more »

makefiles for Sweave, R and LaTeX using Eclipse on Windows

November 22, 2010
By

This post provides a brief introduction to make and makefiles. In particular it describes how to set up make on Windows with an emphasis on using make in Eclipse on projects involving R, Sweave, and LaTeX. Overview make is software that uses makefile...

Read more »

RClimate Tools for Do It Yourself Climate Trend Analysis – Nov, 2010 Update

November 22, 2010
By
RClimate Tools for Do It Yourself Climate Trend Analysis – Nov, 2010 Update

I have made several updates to  RClimate tools for do-it-yourself  climate scientists.  The downloadable monthly climate trends file  (link to csv file) now includes the 5 major global land-ocean temperature anomaly time series (GISS, HAD, NOAA, RS...

Read more »

R.I.P. StatProb?

November 22, 2010
By
R.I.P. StatProb?

As posted in early August from JSM 2010 in Vancouver, StatProb was launched as a way to promote an on-line encyclopedia/wiki with the scientific backup of expert reviewers. This was completely novel and I was quite excited to take part in the venture as a representative of the Royal Statistical Society. Most unfortunately, the separation

Read more »

Access the InfoChimps API from R

November 22, 2010
By

InfoChimps.com is mainly known as a clearinghouse for finding large data sets, for free or for sale. But they have also released (in beta, at least) an API that lets you find some pretty useful information on-demand. Normally, you'd have you use RESTful calls to access the API, but now Drew Conway has created an R package (and released...

Read more »

Example 8.15: Firth logistic regression

November 22, 2010
By
Example 8.15: Firth logistic regression

In logistic regression, when the outcome has low (or high) prevalence, or when there are several interacted categorical predictors, it can happen that for some combination of the predictors, all the observations have the same event status. A similar e...

Read more »

Homage to floating points

November 22, 2010
By

I recently got very close to the floating point trap, again, so here is a little tribute with some small examples!

Read more »

Retrieving transcriptome sequences for RNASeq analysis

November 22, 2010
By

One approach for analyzing RNASeq data from an organism with a well-annotated genome, is to align the reads to mRNA (cDNA) sequences instead of the genome. To do that you need to extract the transcript sequences from a database. This is how to extract ensembl transcript sequences from UCSC from within R:_________________________________________________ library(GenomicFeatures) library(BSgenome.Hsapiens.UCSC.hg18) tr tr_seq write.XStringSet(tr_seq, file="hg18.ensgene.transcripts.fasta", 'fasta', width=80, append=F) _________________________________________________ Next steps...

Read more »

Retrieving transcriptome sequences for RNASeq analysis

November 22, 2010
By

One approach for analyzing RNASeq data from an organism with a well-annotated genome, is to align the reads to mRNA (cDNA) sequences instead of the genome. To do that you need to extract the transcript sequences from a database. This is how to extract ensembl transcript sequences from UCSC from within R:_________________________________________________ library(GenomicFeatures) library(BSgenome.Hsapiens.UCSC.hg18) tr tr_seq write.XStringSet(tr_seq, file="hg18.ensgene.transcripts.fasta", 'fasta', width=80, append=F) _________________________________________________ Next steps...

Read more »

Were stock returns really better in 2007 than 2008?

November 22, 2010
By
Were stock returns really better in 2007 than 2008?

We know that the S&P 500 was up a little in 2007 and down a lot in 2008.  So on the surface the question seems really stupid.  But randomness played a part.  Let’s have a go at deciding how much of a part. Figure 1: Comparison of 2007 and 2008 for the S&P 500. Statistical … Continue reading...

Read more »

Graphical comparison of MCMC performance [arXiv:1011.445]

November 22, 2010
By
Graphical comparison of MCMC performance [arXiv:1011.445]

A new posting on arXiv by Madeleine Thompson on a graphical tool for assessing performance. She has developed a software called SamplerCompare, implemented in R and C. The graphical evaluation plots “log density evaluations per iteration times autocorrelation time against a tuning parameter in a grid of plots where rows represent distributions and columns represent

Read more »

Animate .gif images in R / ImageMagick

November 21, 2010
By
Animate .gif images in R / ImageMagick

Yesterday I surfed the web looking for 3D wireframe examples to explain linear models in class. I stumbled across this site where animated 3D wireframe plots are outputted by SAS.  Below I did something similar in R. This post shows the few steps of how to create an animated .gif file using R and ImageMagick.

Read more »

My First R Package: infochimps

November 20, 2010
By

I have finally taken the plunge and created my first R package! As frequent readers will know, I often sing the praises of infochimps, a startup out of Austin, TX attempting to be the world’s data clearinghouse. While infochimps is an excellent resource for data sets, they also provide their own set excellent data

Read more »

R function for reading big tables

November 20, 2010
By

HugeFileLoader = function(path, sep = "\t", skip = 0, header = T, nrows = 10){### counts the number of lines using shell wc command, and converts the output to numericline.count = paste("wc -l ", path, sep = "")row.count = as.numeric(strsplit(system(li...

Read more »

R function for reading big tables

November 20, 2010
By

HugeFileLoader = function(path, sep = "\t", skip = 0, header = T, nrows = 10){### counts the number of lines using shell wc command, and converts the output to numericline.count = paste("wc -l ", path, sep = "")row.count = as.numeric(strsplit(system(li...

Read more »

ShortCut[R]: locator

November 20, 2010
By
ShortCut[R]: locator

Welcome to my new category: ShortCut! Here I'll shortly explain some smart features, unknown extensions or uncommon pathways of going for gold. Today it's about the Gnu R tool locator.

Read more »

Running R on remote computer via local emacs

November 19, 2010
By

Aquamacs in Mac OS X and Emacs in Linux/unix can be used to edit remote (and local) R code and submit pieces of code to a remote R session. For this to work you need to install ess for emacs (Aquamacs comes with ess by default now, I don't know about e...

Read more »

Running R on remote computer via local emacs

November 19, 2010
By

Aquamacs in Mac OS X and Emacs in Linux/unix can be used to edit remote (and local) R code and submit pieces of code to a remote R session. For this to work you need to install ess for emacs (Aquamacs comes with ess by default now, I don't know about e...

Read more »

Finally! A practical R book on Data Mining: "Data Mining With R, Learning with Case Studies," by Luis Torgo

November 19, 2010
By
Finally! A practical R book on Data Mining:  "Data Mining With R, Learning with Case Studies," by Luis Torgo

I've been a bit busy lately with a few big things, however, I wanted to stop by and mention a fantastic book for those who have been following along the R examples.  Anyone who's followed my blog knows that I'm big on practical books with examples...

Read more »

Is there a Market for Premium R Packages?

November 19, 2010
By
Is there a Market for Premium R Packages?

Nathan Yau, of the excellent FlowingData blog, recently asked on his Twitter stream: I wonder if there’s a market for premium R packages, like there is for say, @wordpress themes and plugins There are some great packages available for R, all of which are currently free. I think it would be great if authors like

Read more »

Airport security: science vs backlash

November 19, 2010
By

The United States has recently introduced millimeter wave and backscatter x-ray scanners to the security screening process in many airports, prompting a backlash in some quarters. Much of the opposition is centered around the invasion of privacy: the scanners generate an image of the traveller's naked body. There are also health concerns, at least for the backscatter x-ray variants...

Read more »

Making R growl

November 18, 2010
By
Making R growl

Spending the day churning through large data set or doing some heavy-duty number crunching? What is one to do while the computer is running in overdrive? We’ll, for one, you could get a steaming cup of joe and write a … Continue reading →

Read more »

Competitive Data Science: An Update

November 18, 2010
By

A quick reminder that two competitions based around data analysis, both very suited to R, are currently underway. First, there's still plenty of time to enter the competition to predict popular R packages, announced by the The Dataists and hosted at Kaggle. According to organizer Drew Conway, the competition has already received 114 entries from 21 teams. But with...

Read more »