Blog Archives

Maximal Information Coefficient (Part II)

September 17, 2014
By
Maximal Information Coefficient (Part II)

A while back, I wrote a post simply announcing a recent paper that described a new statistic called the "Maximal Information Coefficient" (MIC), which is able to describe the correlation between paired variables regardless of linear or nonlinear relationship. This turned out to be quite a popular post, and included a lively discussion...

Read more »

PCA / EOF for data with missing values – a comparison of accuracy

September 15, 2014
By
PCA / EOF for data with missing values – a comparison of accuracy

Not all Principal Component Analysis (PCA) (also called Empirical Orthogonal Function analysis, EOF) approaches are equal when it comes to dealing with a data field that contain missing values (i.e. "gappy"). The following post compares several methods by assessing the accuracy of the derived PCs to reconstruct the "true" data set, as was similarly...

Read more »

“sinkr” – a collection of functions featured on “me nugget”

September 2, 2014
By
“sinkr” – a collection of functions featured on “me nugget”

The R package sinkr (version 1.0) has now been released:  https://github.com/menugget/sinkrI have finally gotten around to learning how to create an R package and decided to start by bundling functions that I have featured on the blog. Thanks to the R Studio team for making this so easy (in combination with...

Read more »

Rotated axis labels in R plots

August 5, 2014
By
Rotated axis labels in R plots

It's somehow amazing to me that the option for slanted or rotated axes labels is not an option within the basic plot() or axis() functions in R.  The advantage is mainly in saving plot area space when long labels are needed (rather than as a means...

Read more »

Flood fill a region of an active device in R

July 23, 2014
By
Flood fill a region of an active device in R

The following is a function to "flood fill" a region on the active plotting device. Once called, the user will be asked to click on the desired target region. The flood fill algorithm then searches neighbors in 4 directions of the target cell (down, le...

Read more »

Automated determination of distribution groupings – A StackOverflow collaboration

May 18, 2014
By
Automated determination of distribution groupings – A StackOverflow collaboration

For those of you not familiar with StackOverflow (SO), it's a coder's help forum on the StackExchange website. It's one of the best resources for R-coding tips that I know of, due entirely to the community of users that routinely give expert advise (as...

Read more »

Evaluating model performance – A practical example of the effects of overfitting and data size on prediction

May 3, 2014
By
Evaluating model performance – A practical example of the effects of overfitting and data size on prediction

Following my last post on decision making trees and machine learning, where I presented some tips gathered from the "Pragmatic Programming Techniques" blog, I have again been impressed by its clear presentation of strategies regarding the evaluation of model performance. I have seen some of these topics presented elsewhere -...

Read more »

Decision making trees and machine learning resources for R

April 30, 2014
By
Decision making trees and machine learning resources for R

I have recently come across Ricky Ho's blog "Pragmatic Programming Techniques", which seems to be excellent resource for all sorts of aspects regarding data exploration and predictive modelling. The post "Six steps in data science" provides a nice overview to some of the topics covered in the blog. For some reason, this blog does not seem to be...

Read more »

Importing bathymetry and coastline data in R

January 25, 2014
By
Importing bathymetry and coastline data in R

After noticing some frustrating inaccuracies with the high-resolution world coastlines and national boundaries database found in worldHires from the package mapdata (based on CIA World Data Bank II data), I decided to look into other options. Although listed as "depreciated", the data found in NOAAs online "Coastline Extractor" is a big step forward. There...

Read more »

GMT standard color palettes

January 25, 2014
By
GMT standard color palettes

GMT (Generic Mapping Tools) (http://gmt.soest.hawaii.edu/) is a great mapping tool. I'm hoping to use it more in the future, but for the meantime I wanted to recreate some of the it's standard color palettes in R. Unfortunately, I couldn't find documen...

Read more »