Fun with the proto package: building an MCMC sampler for Bayesian regression

August 12, 2010
By
Fun with the proto package: building an MCMC sampler for Bayesian regression

The proto package is my latest favourite R goodie. It brings prototype-based programming to the R language - a style of programming that lets you do many of the things you can do with classes, but with a lot less up-front work. Louis Kates and Thomas P...

Read more »

Tuning Notepad++

August 12, 2010
By

Here are some tricks I collected for making Notepad++ a more comfortable text editor for me in general in for the R programming language in particular.Switch between tabs in Notepad++ with Ctrl-PageUp/DownNotepad++'s default behaviour is to use Ctrl+(S...

Read more »

R’s role in the national response to the BP Oil Spill

August 12, 2010
By

In the early days of the Deepwater Horizon oil spill in the Gulf of Mexico, the rate of flow of oil from the spill was of great concern: estimating it accurately was key to coordinating the scale and scope of the response to the emergency. Unfortunately, estimates from independent sources varied widely, and BP's own estimates varied widely over...

Read more »

useR! 2010 conference videos

August 12, 2010
By
useR! 2010 conference videos

Videos of the invited talks of the useR! 2010 conference as follows (courtesy by Kate Mullen and NIST). This site also aims at collecting the materials (video, slides, R code) of local R users group (RUG) meetings and various other … Continue reading →

Read more »

Baseball games: getting longer?

August 11, 2010
By
Baseball games: getting longer?

ESPN's Bill Simmons (aka The Sports Guy) recently suggested that the primary cause of dwindling interest in Red Sox games by fans is that baseball games these days are too long. "It's not that fun to spend 30-45 minutes driving to a game, paying for parking, parking, waiting in line to get in, finding your seat ... and then,...

Read more »

What would a 25th, 50th, and 75th percentile soil profile look like?

August 11, 2010
By
What would a 25th, 50th, and 75th percentile soil profile look like?

I have mentioned the AQP package in previous entries. One of the functions in this package generates aggregate soil profile data, from a collection of soil profiles that are related by some factor: common lithology, common landscape position, and so on...

Read more »

Using R for Introductory Statistics 3.3

August 11, 2010
By
Using R for Introductory Statistics 3.3

...continuing our way though John Verzani's Using R for introductory statistics. Previous installments: chapt1&2, chapt3.1, chapt3.2 Relationships in numeric data If two data series have a natural pairing (x1,y1),...,(xn,yn), then we can ask, &ld...

Read more »

Using R for Introductory Statistics 3.3

August 11, 2010
By
Using R for Introductory Statistics 3.3

...continuing our way though John Verzani's Using R for introductory statistics. Previous installments: chapt1&2, chapt3.1, chapt3.2 Relationships in numeric data If two data series have a natural pairing (x1,y1),...,(xn,yn), then we can ask, &ld...

Read more »

Converting R contingency tables to data frames

August 11, 2010
By

A contingency table presents the joint density of one or more categorical variables. Each entry in a contingency table is a count of the number of times a particular set of factors levels occurs in the dataset. For example, consider a list of plant ...

Read more »

Converting R contingency tables to data frames

August 11, 2010
By

A contingency table presents the joint density of one or more categorical variables. Each entry in a contingency table is a count of the number of times a particular set of factors levels occurs in the dataset. For example, consider a list of plant ...

Read more »

Which chart is better?

August 10, 2010
By
Which chart is better?

CHART CRITICS, GRAPHICS CURMUDGEONS, COME ONE COME ALL Once upon a time there was this graph (graph 1). Andrew Gelman went all graphics curmudgeon on it, calling it an “ugly, sloppy bit of data graphics“, so it became this graph (graph 2). Now the question is, which is better: graph 2 or graph 3? Please

Read more »

R Environments for Gibbs Sampler State

August 10, 2010
By
R Environments for Gibbs Sampler State

I recently decided to revisit some R code that implements a Gibbs sampler in an attempt to decrease the iteration time. My strategy was to implement the sampler state as an R environment rather than a list. The rationale was that passing an environment to and from functions would reduce the amount of duplication (memory

Read more »

Conditioning Systems on Regime Variables

August 10, 2010
By
Conditioning Systems on Regime Variables

Here is a brief and simple example of switching systems based upon regime type (sometimes called gating). I've brought up the idea of conditioning systems based upon regimes many times in past posts. Some texts call this filtering, although I prefer t...

Read more »

A Twitter feed for R links

August 10, 2010
By

India-based data scientist Harsh Singhal has compiled "State of the R": a list of more than 50 links to R-related websites, which has generated much discussion on the R Project group on LinkedIn. Now, even if you're not on LinkedIn, you can find the list at the new Links4R Twitter profile, and get updates about new links by following...

Read more »

Just for Fun: Using R to Create Targets

August 10, 2010
By

OK, not really science or soil-related, but a fun 5 minute use of R to make something you can use to improve your hand-eye coordination. read more

Read more »

Homogeneity analysis of hierarchical classifications

August 10, 2010
By
Homogeneity analysis of hierarchical classifications

I've spent more years than I care to remember analysing vegetation survey data (typically species abundances in plots) using a variety of software including my own algorithms coded in FORTRAN and C++. A recent query on the r-help list, about how to determine the number of groups to define in a hierarchical classification produced with the hclust function, prompted...

Read more »

Homogeneity analysis of hierarchical classifications

August 10, 2010
By
Homogeneity analysis of hierarchical classifications

I've spent more years than I care to remember analysing vegetation survey data (typically species abundances in plots) using a variety of software including my own algorithms coded in FORTRAN and C++. A recent query on the r-help list, about how to det...

Read more »

RQuantLib 0.3.4

August 9, 2010
By

A fresh release of RQuantLib is now on CRAN and in Debian. RQuantLib combines (some of) the quantitative analytics of QuantLib with the R statistical computing environment and language. This follows the 0.3.3 release from last week and has again a num...

Read more »

RQuantLib 0.3.4

August 9, 2010
By

A fresh release of RQuantLib is now on CRAN and in Debian. RQuantLib combines (some of) the quantitative analytics of QuantLib with the R statistical computing environment and language. This follows the 0.3.3 release from last week and has again ...

Read more »

An HSV colour wheel in R

August 9, 2010
By
An HSV colour wheel in R

If you’ve read any of my previous posts, you’ll notice that they’re rather scanty on colour. There’s a reason for this. Mainly, that to get a good colour output takes some time. I recently read a commentary in Nature methods (sorry if you don’t have access to it, but this looks like it may be

Read more »

R unfolds the history of the Afghanistan war

August 9, 2010
By
R unfolds the history of the Afghanistan war

Drew Conway continues his analysis of the Wikileaks data. Having concluded that the data appear legitimate (except perhaps in one region, based on a Benford's Law analysis of the numbers in the documents), Drew follows up with a spatio-temporal analysis of activity within Afghanistan, based on the datelines of the documents themselves (click to enlarge): Each panel represents a...

Read more »

Quickly Find the Class of data.frame vectors in R

August 9, 2010
By

Aviad Klein over at My ContRibution wrote a convenient R function to list the classes of all the vectors that make up a data.frame. You would think apply(kyphosis,2,class) would do the job but it doesn't - it calls every vector a character class. Aviad wrote an elegant little function that does the job perfectly without having to load any...

Read more »

Useful functions for data frames

August 9, 2010
By

The R software system is primarily command line based so when there are large sets of data it is not easy to browse the data frames. There are various useful functions for working with data frames. For example, after loading data from a text file we might want to view the first few lines of a

Read more »

GitHub Stats on Programming Languages

August 9, 2010
By
GitHub Stats on Programming Languages

GitHub has become a popular site for Open Source Developers to stash code and collaborate on projects.  The following are some stats and analysis related to programming languages in use based upon the number of users and repositories.  T...

Read more »

R has the best models

August 9, 2010
By
R has the best models

We had a great time at the JSM conference, and I really enjoyed meeting with all the useRs at the Revolution mixer on Wednesday evening (where this photo was taken). Hope everyone had a great time -- thanks for coming!

Read more »

New R Package ‘aqp’: Algorithms for Quantitative Pedology [updates]

August 9, 2010
By
New R Package ‘aqp’: Algorithms for Quantitative Pedology [updates]

  Soils are routinely sampled and characterized according to genetic horizons (layers), resulting in data that are associated with principal dimensions: location (x,y), depth (z), and property space (p). The high dimensionality and grouped nature...

Read more »

Installing RApache on Mac OS X Snow Leopard

August 9, 2010
By

Hi Folks,for a research project I needed to install RApache on my Mac OS X 10.6.4 (Snow Leopard) machine.It did take some time, a lot of beeping in the video documentary, a lot of recompiles…Here is the way to do it:1. Install Gnu Fortran on your MacYou need to install Gnu Fortran on your Mac, because you have...

Read more »

Installing RApache on Mac OS X Snow Leopard

August 9, 2010
By

Hi Folks,for a research project I needed to install RApache on my Mac OS X 10.6.4 (Snow Leopard) machine.It did take some time, a lot of beeping in the video documentary, a lot of recompiles…Here is the way to do it:1. Install Gnu Fortran on your MacYou need to install Gnu Fortran on your Mac, because you have...

Read more »

Handling Large CSV Files in R

A follow-up of my previous post Excellent Free CSV Splitter. I asked a question at LinkedIn about how to handle large CSV files in R / Matlab. Specifically, Quotationsuppose I have a large CSV file with over 30 million number of rows, both Matlab / R lacks memory when importing the data. Could you...

Read more »