Monthly Archives: April 2012

Measuring user retention using cohort analysis with R

April 27, 2012
By
Measuring user retention using cohort analysis with R

Cohort analysis is super important if you want to know if your service is in fact a leaky bucket despite nice growth of absolute numbers. There’s a good write up on that subject “Cohorts, Retention, Churn, ARPU” by Matt Johnson. So how to do it using R and how to visualize it. Inspired by examples

Read more »

Speeding up R computations Pt II: compiling

April 27, 2012
By

A year ago I wrote a post on speeding up R computations. Some of the tips that I mentioned then have since been made redundant by a single package: compiler. Forget about worrying about curly brackets and whether to write 3*3 or 3^2 - compile...

Read more »

Create polygons from a matrix

April 27, 2012
By
Create polygons from a matrix

The following function matrix.poly allows for the addition of polygons to a plot based on a matrix and defined matrix positions. I have used this function on occasion to highlight specific matrix locations (e.g. in the above figure). You can do the same by overlaying another image (left in above plot) but with this...

Read more »

Read Big Text Files Column by Column

April 27, 2012
By

Dear R Programmers,There is new package "colbycol" on CRAN, which makes our jobs easier when we have large files i.e. more than a GB to be read in R. Especially, when we don't need all of the columns/variables for our analysis. Kudos for author, Carlos...

Read more »

Graphic Parameters (symbols, line types, and colors) for ggplot2

April 27, 2012
By
Graphic Parameters (symbols, line types, and colors) for ggplot2

Following up on John Mount’s post on remembering symbol parameters in ggplot2, I decided to give it a try and included symbols, line types, and colors (based upon Earl Glynn’s wonderful color chart).  Code follows below.require(ggplot2) ...

Read more »

Graphic Parameters (symbols, line types, and colors) for ggplot2

April 27, 2012
By
Graphic Parameters (symbols, line types, and colors) for ggplot2

Following up on John Mount’s post on remembering symbol parameters in ggplot2, I decided to give it a try and included symbols, line types, and colors (based upon Earl Glynn’s wonderful color chart).  Code follows below.require(ggplot2) ...

Read more »

Randomization thoughts

April 27, 2012
By
Randomization thoughts

Le Grand Casino of Monte CarloOn Monday I’m going to be leading a little stats workshop on randomization tests and null models. In preparation for this I wrote up code for null model examples I wanted to write a post that introduced the basics of these models (Null models, bootstrapping,...

Read more »

soilDB Demo: Processing SSURGO Attribute Data with SDA_query()

April 26, 2012
By
soilDB Demo: Processing SSURGO Attribute Data with SDA_query()

Mapping near Paloma, CA This image has nothing to do with the following content. A quick example of how to use the USDA-NRCS soil data access query facility (SDA), via the soilDB package for R. The following code describes how to get component-level so...

Read more »

phyloseq: Reproducible interactive analysis of microbiome census data using R

April 26, 2012
By
phyloseq: Reproducible interactive analysis of microbiome census data using R

Collaborative development of phyloseq on GitHub. Official stable release of phyloseq on Bioconductor. Advances in DNA sequencing technology have dramatically improved the scope and scale of culture-independent investigations into microbial communities. There are effective software tools available to process raw DNA … Continue reading →

Read more »

AdfTest Function Enhanced With Rcpp Armadillo

April 26, 2012
By

In my previous post about rewriting my code to run in parallel part one I mentioned that we will make a small change to adfTest() function as well. In this post we will perform this small but performance-dramatic change. When you take a closer look at the source code of this particular function from fUnitRoots package

Read more »