Clustering analysis and its implementation in R

April 29, 2012
By

Earlier I posted a blog for "k-means + heatmap" used for clustering analysis. Recently to prepare for the "Bioinformatics Tools" meeting, I made a slide with more details on "clustering analysis". Here it is:https://docs.google.com/presentation/d/1vMS3...

Read more »

Animating Schelling’s segregation model

April 29, 2012
By
Animating Schelling’s segregation model

Recent blog post on Animations in R inspired me to write a code that generates animations of simulation model. For this task I have chosen Schelling's segregation model.Having written the code I have found that one year ago a similar code has been...

Read more »

Guess who wins: apply() versus for loops in R

April 28, 2012
By
Guess who wins: apply() versus for loops in R

Yesterday I tried to do some data processing on my really big data set in MS Excel. Wow, did it not like handling all those data!! Every time I tried to click on a different ribbon, the screen didn’t even … Continue reading →

Read more »

Open data and ecological fallacy

April 28, 2012
By
Open data and ecological fallacy

A couple of days ago, on Twitter, @alung mentioned an old post I did publish on this blog about open-data, explaining how difficult it was to get access to data in France (the post, published almost 18 months ago can be found here, in French)....

Read more »

microbenchmarking with R

April 28, 2012
By
microbenchmarking with R

I love to benchmark.  Maybe I’m a bit weird but I love to bench  everything in R.  Recently I’ve had people raise accuracy challenges to the typical system.time and rbenchmark package approaches to benchmarking.  I saw Hadley Wickham promoting the … Continue reading →

Read more »

Correlation of temperature proxies with observations

April 28, 2012
By
Correlation of temperature proxies with observations

The climate change debate focuses mainly around the assumption that the annual global mean temperatures of the past few decades have been the highest in the past millenium. How do we know what the annual global mean temperature was in the year, say, 1351 AD? The answer is: Through temperature proxies. Such proxies include tree

Read more »

R equivalents to SAS and SPSS procedures

April 27, 2012
By

With more than 5,000 R packages now available (from the CRAN and BioConductor repositories), for any statistical or data analysis procedure you can confidently say, "there's a package for that". To make it easier for SAS and SPSS users to find what they need in R, Bob Muenchen has updated his useful table of equivalent R packages for SAS...

Read more »

Sage Bionetworks Synapse

April 27, 2012
By
Sage Bionetworks Synapse

Michael Kellen, Director of Technology at Sage Bionetworks, is trying to build a GitHub for science. It's called Synapse and Kellen described it in a talk at the Sage Bionetworks Commons Congress 2012, this past weekend: 'Synapse' Pilot for Building an...

Read more »

The Best Statistical Programming Language is …Javascript?

April 27, 2012
By

R-Bloggers has recently been buzzing about Julia, the new kid on the statistical programming block. Julia, however, is hardly the sole contender for the market of R defectors, with Clojure-fork Incanter generating buzz as well. Even with these two making noise, I think there’s a huge point that everyone is missing, and it’s front-and-center on

Read more »

An academic programming language paper about R

April 27, 2012
By
An academic programming language paper about R

The R language has passed another milestone, a paper aimed at the academic programming language community (or at least one section of this community) has been written about it, Evaluating the Design of the R Language by Morandat, Hill, Osvald and Vitek. Hardly earth shattering news, but it may have some impact on how R

Read more »

R Workshop: Reproducible Research using Sweave for Beginers

April 27, 2012
By
R Workshop: Reproducible Research using Sweave for Beginers

Monday, April 30, 2012  14h-16h. Stewart Biology Rm w6/12 (Montreal) guRu: Denis Haine (Université de Montréal) Topics Reproducible research was first coined by Pr. Jon Claerbout, professor of geophysics at Stanford University, to describe that the results from researches can be replicated by other scientists by making available data, procedures, materials and the computational environment

Read more »

How to download complete XML records from PubMed and extract data

April 27, 2012
By
How to download complete XML records from PubMed and extract data

Yesterday I wrote an article that looked at the top 20 Cognitive Behavior Therapy journals with the most publications; today I will explain how I did it with R.

Read more »

A Bayesian Consumption Function

April 27, 2012
By
A Bayesian Consumption Function

What the title of this post is supposed to mean is: "Estimating a simple aggregate consumption function using Bayesian regression analysis".In a recent post I mentioned my long-standing interest in Bayesian Econometrics. When I teach this material I usually include a simple application that involves estimating a consumption function using U.S. time-series data. I used to have...

Read more »

Real Time Structural Break

April 27, 2012
By
Real Time Structural Break

Yesterday as I played with bfast I kept thinking “Yes, but this is all in hindsight.  How can I potentially use this in a system?”  Fortunately, one of the fine authors very generously commented on my post Structural Breaks (Bull or Bear?...

Read more »

Measuring user retention using cohort analysis with R

April 27, 2012
By
Measuring user retention using cohort analysis with R

Cohort analysis is super important if you want to know if your service is in fact a leaky bucket despite nice growth of absolute numbers. There’s a good write up on that subject “Cohorts, Retention, Churn, ARPU” by Matt Johnson. So how to do it using R and how to visualize it. Inspired by examples

Read more »

Speeding up R computations Pt II: compiling

April 27, 2012
By

A year ago I wrote a post on speeding up R computations. Some of the tips that I mentioned then have since been made redundant by a single package: compiler. Forget about worrying about curly brackets and whether to write 3*3 or 3^2 - compile...

Read more »

Create polygons from a matrix

April 27, 2012
By
Create polygons from a matrix

The following function matrix.poly allows for the addition of polygons to a plot based on a matrix and defined matrix positions. I have used this function on occasion to highlight specific matrix locations (e.g. in the above figure). You can do the same by overlaying another image (left in above plot) but with this...

Read more »

Read Big Text Files Column by Column

April 27, 2012
By

Dear R Programmers,There is new package "colbycol" on CRAN, which makes our jobs easier when we have large files i.e. more than a GB to be read in R. Especially, when we don't need all of the columns/variables for our analysis. Kudos for author, Carlos...

Read more »

Graphic Parameters (symbols, line types, and colors) for ggplot2

April 27, 2012
By
Graphic Parameters (symbols, line types, and colors) for ggplot2

Following up on John Mount’s post on remembering symbol parameters in ggplot2, I decided to give it a try and included symbols, line types, and colors (based upon Earl Glynn’s wonderful color chart).  Code follows below. require(ggplot2) ...

Read more »

Graphic Parameters (symbols, line types, and colors) for ggplot2

April 27, 2012
By
Graphic Parameters (symbols, line types, and colors) for ggplot2

Following up on John Mount’s post on remembering symbol parameters in ggplot2, I decided to give it a try and included symbols, line types, and colors (based upon Earl Glynn’s wonderful color chart).  Code follows below. require(ggplot2) ...

Read more »

Randomization thoughts

April 27, 2012
By
Randomization thoughts

Le Grand Casino of Monte CarloOn Monday I’m going to be leading a little stats workshop on randomization tests and null models. In preparation for this I wrote up code for null model examples I wanted to write a post that introduced the basics of these models (Null models, bootstrapping,...

Read more »

soilDB Demo: Processing SSURGO Attribute Data with SDA_query()

April 26, 2012
By
soilDB Demo: Processing SSURGO Attribute Data with SDA_query()

Mapping near Paloma, CA This image has nothing to do with the following content. A quick example of how to use the USDA-NRCS soil data access query facility (SDA), via the soilDB package for R. The following code describes how to get component-level so...

Read more »

phyloseq: Reproducible interactive analysis of microbiome census data using R

April 26, 2012
By
phyloseq: Reproducible interactive analysis of microbiome census data using R

Collaborative development of phyloseq on GitHub. Official stable release of phyloseq on Bioconductor. Advances in DNA sequencing technology have dramatically improved the scope and scale of culture-independent investigations into microbial communities. There are effective software tools available to process raw DNA … Continue reading →

Read more »

AdfTest Function Enhanced With Rcpp Armadillo

April 26, 2012
By

In my previous post about rewriting my code to run in parallel part one I mentioned that we will make a small change to adfTest() function as well. In this post we will perform this small but performance-dramatic change. When you take a closer look at the source code of this particular function from fUnitRoots package

Read more »

Structural Breaks (Bull or Bear?)

April 26, 2012
By
Structural Breaks (Bull or Bear?)

When I spotted the bfast R package, I could not resist attempting to apply it to identify bull and bear markets.  For all the details that I do not understand, please see the references: Jan Verbesselt, Rob Hyndman, Glenn Newnham, Darius Culvenor...

Read more »

Graphic Parameters (symbols, line types, and colors) for ggplot2

April 26, 2012
By
Graphic Parameters (symbols, line types, and colors) for ggplot2

Following up on John Mount’s post on remembering symbol parameters in ggplot2, I decided to give it a try and included symbols, line types, and colors (based upon Earl Glynn’s wonderful color chart).  Code follows below.    

Read more »

Big Data statistics in the search for a cure for MS

April 26, 2012
By

Multiple Sclerosis (MS) is a debilitating and complex disease with an unknown cause — and for which there is currently no cure. The SUNY Buffalo is home to one of the leading multiple sclerosis (MS) research centers in the world, and as reported in Healthcare IT News, the research team is using IBM Netezza and Revolution R Enterprise to...

Read more »

spam evolution

April 26, 2012
By
spam evolution

Despite some rather modest protection (like a simple captcha), I still receive spammy comments on this blog every now and again. They’re easily spotted and actually never appear on the website. There’s obviously an incentive for the spammer to post … Continue reading →

Read more »

R Tips: lots of tips for R programming

April 26, 2012
By
R Tips: lots of tips for R programming

by Yanchang Zhao, RDataMining.com There are more than 100 R tips at http://pj.freefaculty.org/R/Rtips.html, which provide quick examples to small challenges in everyday R programming, especially for users switching from other languages to R. There is also a .PDF version for … Continue reading →

Read more »