Loading Big (ish) Data into R

November 24, 2009
By
Loading Big (ish) Data into R

So for the rest of this conversation big data == 2 Gigs. Done. Don’t give me any of this ‘that’s not big, THIS is big’ shit. There now, on with the cool stuff: This week on twitter Vince Buffalo asked about loading a 2 gig comma separated file (csv) into R (OK, he asked about tab

Read more »

ESS on Mac OS X

November 24, 2009
By

One of the search terms that bring people frequently to my site is "install ESS on Mac OS X" or something like that. As it turns out installing ESS on OS X is really easy, but Google search does not really bring up good instructions. There are at least two easy options: Use Aquamacs, it comes bundled with...

Read more »

ESS on Mac OS X

November 24, 2009
By

One of the search terms that bring people frequently to my site is "install ESS on Mac OS X" or something like that. As it turns out installing ESS on OS X is really easy, but Google search does not really bring up good instructions. There are at least two easy options: Use Aquamacs, it comes bundled with...

Read more »

NYT: SAS threatened by R

November 23, 2009
By

The New York Times had an interesting piece yesterday about how SAS is facing several business threats from companies like the recently IBM-acquired SPSS, and from burgeoning interest in open-source software like R.  The NYT ran an entire article about R earlier this year, and this article discusses how SAS has been revamping their technology to work seamlessly with...

Read more »

RQuantlib

November 23, 2009
By

Quantlib is a free library for modeling, trading, and risk management in real-life providing a comprehensive software framework for quantitative finance, it is written in C++, which might be inconvenient for some users. JQuantLib aiming at Java-fans i...

Read more »

Memory Management in R: A Few Tips and Tricks

November 23, 2009
By
Memory Management in R: A Few Tips and Tricks

This post discusses a few strategies that I have used to to manage memory in  R.Stack Overflow TipsStack Overflow has a thread on Memory Management Tricks. I tend to follow these suggestions:.ls.objects(): There's a nice function (.ls.objects...

Read more »

Type II Error

November 22, 2009
By
Type II Error

In hypothesis testing, a type II error is due to a failure of rejecting an invalid null hypothesis. The probability of avoiding a type II error is called the power of the hypothesis test, and is denoted by the quantity 1 - β . read more

Read more »

Type II Error

November 22, 2009
By
Type II Error

In hypothesis testing, a type II error is due to a failure of rejecting an invalid null hypothesis. The probability of avoiding a type II error is called the power of the hypothesis test, and is denoted by the quantity 1 - β . read more

Read more »

Some sort of update to ggplot2

November 22, 2009
By

Jeroen Ooms writes: Here's a first version of a new web application for exploratory graphical analysis. It attempts to implement the layered graphics from the R package ggplot2 in a user-friendly way. This two-minute demo video demonstrates a ...

Read more »

new R package : highlight

November 22, 2009
By

I finally pushed highlight to CRAN, which should be available in a few days. The package uses the information gathered by the parser package to perform syntax highlighting of R code The main function of the package is highlight, which takes a numb...

Read more »

R examine objects tutorial

November 21, 2009
By
R examine objects tutorial

This article is quick concrete example of how to use the techniques from Survive R to lower the steepness of The R Project for Statistical Computing‘s learning curve (so an apology to all readers who are not interested in R). What follows is for people who already use R and want to achieve more control Related posts:

Read more »

My implementation of Berry and Berry’s hierarchical Bayes algorithm for adverse events

November 20, 2009
By

I've been working on this for quite some time (see here for a little background), so I'm pleased that it looks close to done at least as far as the core algorithm. It uses global variables for now, and I'm sure there are a couple of other bugs lurking, but here it is, after the jump.const.sqrt2pi <-...

Read more »

Mapping Biomes

November 20, 2009
By
Mapping Biomes

Recently (2008) the European Space Agency produced GlobCover (ESA GlobCover Project, led by MEDIAS-France), the highest resolution (300m) global land cover map to date. GlobCover uses 21 primary land cover classes and many more sub-classes. Land cover classification (LCC) schemes divide the earth into biomes. Biomes are the simplest way to classify vegetation which can

Read more »

Working on a drug safety project

November 20, 2009
By

In order to move some of my personal interests along, I have been trying to implement the methodology found in Berry and Berry's article Accounting for Multiplicities in Assessing Drug Safety. This methodology uses the MedDRA hierarchy to improve the p...

Read more »

Tactical asset allocation using blotter

November 18, 2009
By
Tactical asset allocation using blotter

blotter is an R package that tracks the P&L of your trading systems (or simulations), even if your portfolio spans many security types and/or currencies. This post uses blotter to track a simple two-ETF trading system. The contents of this post b...

Read more »

Design of Experiments – Power Calculations

November 18, 2009
By

Prior to conducting an experiment researchers will often undertake power calculations to determine the sample size required in their work to detect a meaningful scientific effect with sufficient power. In R there are functions to calculate either a minimum sample size for a specific power for a test or the power of a test for

Read more »

Confidence we seek…

November 18, 2009
By
Confidence we seek…

Estimating a proportion at first looks elementary. Hail to aymptotics, right? Well, initially it might seem efficient to iuse the fact that . In other words the classical confidence interval relies on the inversion of Wald’s test. A function to ease the computation is the following (not really needed!). waldci<- function(x,n,level){ phat<-sum(x)/n results<-phat + c(-1,1)*qnorm(1-level/2)*sqrt(phat*(1-phat)/n) print(results) } An exact confidence interval is

Read more »

Quantitative link strength for APE cophyloplot

November 17, 2009
By
Quantitative link strength for APE cophyloplot

Just add a third column with link strength to the association matrix plotCophylo2 <- function (x, y, assoc = assoc, use.edge.length = use.edge.length, space = space, length.line = length.line, gap = gap, type = type, return = return, col = col, show.tip.label = show.tip.label, font = font) { if(ncol(assoc)==2) { assoc <- cbind(assoc,rep(1,nrow(assoc))) } res

Read more »

swfDevice is nearing completion

November 17, 2009
By

My new R package, swfDevice, is getting close to its first release. This package enables native R graphics output as swf (flash) files. It also as the ability to create animations with player controls. The main project page is here and the results of the test suite are here. Here are some samples: http://swfdevice.r-forge.r-project.org/swfDevice_test29.swf http://swfdevice.r-forge.r-project.org/swfDevice_test28.swf

Read more »

R tip: Extracting median from survfit object

November 17, 2009
By

A colleague wanted to extract the median value from a survival analysis object, which turned out to be a pain as the value is not stored in the object, but calculated on the fly by a print method.> library(survival)> fit > survfit(fit)Call: survfit(formula = fit)records n.max n.start events median 0.95LCL 0.95UCL ...

Read more »

R tip: Extracting median from survfit object

November 17, 2009
By

A colleague wanted to extract the median value from a survival analysis object, which turned out to be a pain as the value is not stored in the object, but calculated on the fly by a print method. > library(survival)> fit > survfit(fit)Call: survfit(formula = fit)records n.max n.start events median 0.95LCL 0.95UCL ...

Read more »

R functions for Dienes (2008) Understanding Psychology as a Science

November 17, 2009
By

I recently wrote a review of Understanding psychology as a science: an introduction to scientific and statistical inference by Zoltan Dienes (2008). Dienes' book covers Neyman-Pearson null hypothesis significance testing, Bayesian inference and the lik...

Read more »

Seminar: Reproducible Research with R, LaTeX, & Sweave

November 16, 2009
By

Theresa Scott, instructor of the previously mentioned R workshop and weekly R clinic, is giving a lecture entitled "Reproducible Research with R, LaTeX, & Sweave" in MRB III, room 1220, this Wednesday 11/18 at 1:30.  You can see more details about the lecture here. Looks like her slides as well as much more introductory material on R, Latex, and Sweave...

Read more »

Infomaps using R – Visualizing German unemployment rates by district on a map

November 16, 2009
By
Infomaps using R – Visualizing German unemployment rates by district on a map

Lately, David Smith from REvolution Computing set out to challenge the R community with the reprocuction of a beautiful choropleth map (= multiple regions map/thematic map) on US unemployment rates he had seen on the Flowing Data blog. Here you can find the impressing results. Being a fan of beautiful visualizations I tried to produce

Read more »

R in Action – early thoughts

November 16, 2009
By
R in Action – early thoughts

I was invited to review the book R in Action written by Rob Kabacoff. Since I consider the Quick-R website, created by the same smart guy, one of the most valuable resources about R, It is both an honor and a pleasure to have the opportunity to take an...

Read more »

R in Action – early thoughts

November 16, 2009
By
R in Action – early thoughts

I was invited to review the book R in Action written by Rob Kabacoff. Since I consider the Quick-R website, created by the same smart guy, one of the most valuable resources about R, It is both an honor and a pleasure to have the opportunity to take an...

Read more »

The Top Scores for Canabalt, Take 2

November 15, 2009
By
The Top Scores for Canabalt, Take 2

Introduction As promised on Thursday, here’s my second pass at a statistical analysis of Canabalt scores. There are some useful results I’ll present right at the start, and then there are some results that are more or less worthless, except that working through my own mistakes helped me to think more clearly about statistical modeling in

Read more »

OpenMX

November 15, 2009
By
OpenMX

Looks promising: http://openmx.psyc.virginia.edu/Right now it cannot be build from source because there are some comparabilities between OpenMx and R 2.10.0, but I assume this will be resolved soon.And the development seems to be quite active.

Read more »

R Tutorial Series: Scatterplots

November 12, 2009
By
R Tutorial Series: Scatterplots

A scatterplot is a useful way to visualize the relationship between two variables. Similar to correlations, scatterplots are often used to make initial diagnoses before any statistical analyses are conducted. This tutorial will explore the ways in whic...

Read more »