The difference between “letters[c(1,NA)]” and “letters[c(NA,NA)]“

April 22, 2010
By

In David Smith’s latest blog post (which, in a sense, is a continued response to the latest public attack on R), there was a comment by Barry that caught my eye. Barry wrote: Even I get caught out on R quirks after 20 years of using it. Compare letters and letters for the most recent thing that made me...

Read more »

Free Video Courses on R, Structural Equation Modelling, Causal Inference, and Regression from Uni Jena

April 22, 2010
By
Free Video Courses on R, Structural Equation Modelling, Causal Inference, and Regression from Uni Jena

The Department of methodology and Evaluation Research at Universität Jena has made available a set of free online video courses on data analysis.They cover topics that are particularly relevant to psychology and social science researchers, including ...

Read more »

R: more plotting fun, this time with the Poisson

April 21, 2010
By
R: more plotting fun, this time with the Poisson

Click on image for a larger version. Here is the code: par(bg="black") par(mar=c(0,0,0,0)) plot(sort(rpois(10000,100))/rpois(10000,100),frame.plot=F,pch=20,col="blue")

Read more »

Automated way to check for PGF version

April 21, 2010
By

This is one way to check for the version of PGF that is installed in an automated way. First create a tex file with the following contents: \documentclass{article} \usepackage{tikz} \batchmode \makeatletter \typeout{PGFVersion=\pgfversion} \@@end Say you named it test-pgf-version.tex. Then: pdflatex test-pgf-version.tex cat test-pgf-verson.log | grep PGFVersion | sed ‘s/PGFVersion=//’ should display the version number. I

Read more »

Why use R? Because that’s what the pros use

April 21, 2010
By

I had the great pleasure of sitting down for a beer with Steve O'Grady (from the open-source analyst group RedMonk), at the MySQL conference last week. It was great to get the perspective of someone who knows the tech industry so well, sees predictive analytics as a hot area, and is taking an active interest in statistics and R...

Read more »

Doing Maximum Likelihood Estimation by Hand in R

April 21, 2010
By
Doing Maximum Likelihood Estimation by Hand in R

Lately I’ve been writing maximum likelihood estimation code by hand for some economic models that I’m working with. It’s actually a fairly simple task, so I thought that I would write up the basic approach in case there are readers who haven’t built a generic estimation system before. First, let’s start with a toy example

Read more »

Parallel Multicore Processing with R (on Windows)

April 21, 2010
By

Parallel Processing backend for R under windows - installation tips and some examples.

Read more »

Little R == r

April 21, 2010
By

There's big R, the R that I use to do most my work, the environment that makes pretty graphics, et. al. It's like matlab, only cooler. Or more cool. Or less uncool. You can see my prejudices here. Today i discovered little R. It's like big R, only little. Holy shit. Dirk gives...

Read more »

Experiments with igraph

April 21, 2010
By
Experiments with igraph

Networks – social and biological – are all the rage, just now. Indeed, a recent entry at Duncan’s QOTD described the “hairball” network representation as the dominant cultural icon in molecular biology. I’ve not had occasion to explore networks “professionally”, but have always been fascinated by both networks and the tools used to analyse them.

Read more »

R / Finance 2010 presentations

April 20, 2010
By

Last Friday and Saturday the second R / Finance conference took place in Chicago on the UIC campus. As a co-organizer, it was a great pleasure to see so many users of R in Finance---from both industry and academia---come to Chicago to discuss and sh...

Read more »

R / Finance 2010 presentations

April 20, 2010
By

Last Friday and Saturday the second R / Finance conference took place in Chicago on the UIC campus. As a co-organizer, it was a great pleasure to see so many users of R in Finance---from both industry and academia---come to Chicago to discuss and shar...

Read more »

R / Finance 2010 presentations

April 20, 2010
By

Last Friday and Saturday the second R / Finance conference took place in Chicago on the UIC campus. As a co-organizer, it was a great pleasure to see so many users of R in Finance---from both industry and academia---come to Chicago to discuss and sh...

Read more »

Book Review – ggplot 2: Elegant Graphics for Data Analysis by Hadley Wickham (Springer 2009)

April 20, 2010
By
Book Review – ggplot 2: Elegant Graphics for Data Analysis by Hadley Wickham (Springer 2009)

Order this book from Amazon This book is written by the author of the ggplot2 package for R, which is a package with a design inspired by the grammar of graphics and can remove some of the effort required to put together impressive graphs. The book is just under 200 pages and covers a

Read more »

Data I/O performance tips

April 20, 2010
By

The R tag on StackOverflow recently topped 1000 questions, and continues to be a great community resource for practical tips on using the R language for data analysis and visualization. To take one example, "Efficiency of operations on R data structures" has been answered with some great tips on efficiently getting data in and out of the R system....

Read more »

RClimate Script: Pacific Decadal Oscillation Trend

April 19, 2010
By
RClimate Script: Pacific Decadal Oscillation Trend

This RClimate Script lets users retrieve and plot the monthly and moving average  Pacific Decadal Oscillation (PDO) data  from the University of Washington’s JISAO website. The script retrieves the PDO data from January, 1900 until latest month available at time … Continue reading →

Read more »

R and the Next Big Thing

April 19, 2010
By

I've been travelling for the past few days (for the R/Finance 2010 conference in Chicago), so I'd missed much of the reaction to AnnMaria De Mars' article last week where she claimed that "R is an epic fail". Understandably, that inflammatory statement provoked many reactions from the R community on Twitter and in the blogosphere. (I suspect the fact...

Read more »

A stateful C function for R: parsing Fasta sequences

April 19, 2010
By

In the following post, I'll create a C extension for R. This extension will iterate over all the FASTA sequences in a file and will return a pair(name,sequence) for each sequence, that is to say that I won't store all the sequences in memory.The C code...

Read more »

A stateful C function for R: parsing Fasta sequences

April 19, 2010
By

In the following post, I'll create a C extension for R. This extension will iterate over all the FASTA sequences in a file and will return a pair(name,sequence) for each sequence, that is to say that I won't store all the sequences in memory.The C code...

Read more »

Converting Alpha-Shapes into SP Objects

April 19, 2010
By
Converting Alpha-Shapes into SP Objects

Just read about a new R package called alphahull (paper) that sounds like it might be a good candidate for addressing this request regarding concave hulls. Below are some notes on computing alpha-shapes and alpha-hulls from spatial data and converting the results returned by ashape() and ahull() into SP-class objects. Note that the functions...

Read more »

R and Tolerance Intervals

April 19, 2010
By

Confidence intervals and prediction intervals are used by statisticians on a regular basis. Another useful interval is the tolerance interval that describes the range of values for a distribution with confidence limits calculated to a particular percentile of the distribution. The R package tolerance can be used to create a variety of tolerance intervals of

Read more »

Estimating Missing Data with aregImpute() {R}

April 19, 2010
By

  Missing Data Soil scientists routinely sample, characterize, and summarize patterns in soil properties in space, with depth, and through time. Invariably, some samples will be lost or sufficient funds required for complete characterization can run out. In these cases the scientist is left with a data table that contains holes (so to speak) in the rows/columns that are...

Read more »

R tip: Maximum screen width

April 19, 2010
By

R can be annoying in that even if you stretch your terminal or R GUI session to a whole screen width it will still only show 80 characters width. This can make wide tables really hard to read. options(width=150)Use the options command width to set...

Read more »

R tip: Maximum screen width

April 19, 2010
By

R can be annoying in that even if you stretch your terminal or R GUI session to a whole screen width it will still only show 80 characters width. This can make wide tables really hard to read. options(width=150)Use the options command width to set...

Read more »

Example 7.33: Specifying fonts in graphics

April 19, 2010
By
Example 7.33: Specifying fonts in graphics

For interactive data analysis, the default fonts used by SAS and R are acceptable, if not beautiful. However, for publication, it may be important to manipulate the fonts. For example, it would be desirable for the fonts in legends, axis labels, or o...

Read more »

Getting your web application and R(Apache) to talk to each other

April 19, 2010
By
Getting your web application and R(Apache) to talk to each other

Here’s the situation. Web applications, built using a framework (e.g. Rails, Django) are great for fetching data from a database and rendering it. They’re not so great for crunching and charting the data. Conversely, R is great for crunching and charting, but doesn’t make for a great web application. The idea then, is to let

Read more »

Thoughts on LSPM from R/Finance 2010

April 18, 2010
By
Thoughts on LSPM from R/Finance 2010

I just got back from R/Finance 2010 in Chicago. If you couldn't make it this year, I strongly encourage you to attend next year. I will post a more comprehensive review of the event in the next couple days, but I wanted to share some of my notes spec...

Read more »

Sudokus more random than random!

April 18, 2010
By
Sudokus more random than random!

Darren Wraith pointed out this column about sudokus to me. It analyses the paper by Newton and De Salvo published in the Proceedings of the Royal Academy of Sciences A that I cannot access from home. The discussion contains this absurd sentence “Sudoku matrices are actually more random than randomly-generated matrices” which shows how mistreated

Read more »

Summarising data using scatter plots

April 18, 2010
By
Summarising data using scatter plots

A scatter plot is a graph used to investigate the relationship between two variables in a data set. The x and y axes are used for the values of the two variables and a symbol on the graph represents the combination for each pair of values in the data set. This type of graph is

Read more »

alphahull: an R Package for Alpha-Convex Hull

April 16, 2010
By
alphahull: an R Package for Alpha-Convex Hull

new paper on the α-convex hull appeared in the Journal of Statistical Software today (http://www.jstatsoft.org/v34/i05/paper). The α-convex hull is an interesting problem which caught my attention long time ago but I didn’t know a solution then. R has a function chull() which can generate (indices of) the convex hull for a series of points. Now

Read more »