Monthly Archives: April 2012

The R-Podcast Episode 6: Importing Data from External Sources

April 29, 2012
By

In this episode: Listener feedback and importing data from external sources into R. We dive into the basics of importing delimited text files using read.table and its varients. We also discuss recommendations for importing MS Excel spreadsheet files, relational databases such as MySQL, data from HTML tables, and files produced by other statistical computing packages.

Read more »

The “best” proxies for temperature reconstruction

April 29, 2012
By
The “best” proxies for temperature reconstruction

In the last post I presented the distribution of correlation coefficients of temperature proxies with the actual temperature observations during the past 150 years. One of the conclusions was that most proxies correlate weakly with temperature observations. However, there seemed to be some proxies that do have some significant positive correlation with the observations. These

Read more »

mad statistic

April 29, 2012
By
mad statistic

In the motivating toy example to our ABC model choice paper, we compare summary statistics, mean, median, variance, and… median absolute deviation (mad). The latest is the only one able to discriminate between our normal and Laplace models (as now discussed on Cross Validated!). When rerunning simulations to produce nicer graphical outcomes (for the revision),

Read more »

The Need for paste2 (part I)

April 29, 2012
By
The Need for paste2 (part I)

This is Part I of a multi part blog on the paste2 function… I recently generated a new paste function that takes an unspecified list of equal length variables (a column) or multiple columns of a data frame  and pastes … Continue reading →

Read more »

Getting SASsy

April 29, 2012
By

Although I am most familiar with R for statistical analysis and programming, I also use a fair amount of SAS at work. I found it a huge transition at first, but one thing that helped make SAS “click” for me … Continue reading →

Read more »

Clustering analysis and its implementation in R

April 29, 2012
By

Earlier I posted a blog for "k-means + heatmap" used for clustering analysis. Recently to prepare for the "Bioinformatics Tools" meeting, I made a slide with more details on "clustering analysis". Here it is:https://docs.google.com/presentation/d/1vMS3...

Read more »

Animating Schelling’s segregation model

April 29, 2012
By
Animating Schelling’s segregation model

Recent blog post on Animations in R inspired me to write a code that generates animations of simulation model. For this task I have chosen Schelling's segregation model.Having written the code I have found that one year ago a similar code has been...

Read more »

Guess who wins: apply() versus for loops in R

April 28, 2012
By
Guess who wins: apply() versus for loops in R

Yesterday I tried to do some data processing on my really big data set in MS Excel. Wow, did it not like handling all those data!! Every time I tried to click on a different ribbon, the screen didn’t even … Continue reading →

Read more »

Open data and ecological fallacy

April 28, 2012
By
Open data and ecological fallacy

A couple of days ago, on Twitter, @alung mentioned an old post I did publish on this blog about open-data, explaining how difficult it was to get access to data in France (the post, published almost 18 months ago can be found here, in French)....

Read more »

microbenchmarking with R

April 28, 2012
By
microbenchmarking with R

I love to benchmark.  Maybe I’m a bit weird but I love to bench  everything in R.  Recently I’ve had people raise accuracy challenges to the typical system.time and rbenchmark package approaches to benchmarking.  I saw Hadley Wickham promoting the … Continue reading →

Read more »