## Canonical Correlation Analysis for finding patterns in coupled fields

March 25, 2012
First CCA pattern of Sea Level Pressure (SLP) and Sea Surface Temperature (SST) monthly anomalies for the region between -180 °W to -70 °W and +30 °N to -30 °S. The following post demonstrates the use of Canonical Correlation Analysis (CCA) for diagnosing coupled patterns in climate fields....

## Classification Trees and Spatial Autocorrelation

March 25, 2012
I'm currently trying to model species presence / absence data (N = 523) that were collected over a geographic area and are possibly spatially autocorrelated. Samples come from preferential sites (sea level > 1200 m, obligatory presence of permanent ...

## The R-Podcast Episode 4: Data Structures-Introduction

March 25, 2012
In this episode: Site updates, additional screencasts about R from other sites, listener feedback, and discussion on the fundamental data structures for R: vectors, matrices, lists, and data frames. The R code discussed in this episode is available in ...

## VIDEO: Applying "MSC" math-treatment to our raw spectra in "R".

March 25, 2012
VIDEO: Applying "MSC" math-treatment to our raw spectra in "R".

## Levenshtein distance in C++ and code profiling in R

March 25, 2012
At work, the client requested, if existing search engine could accept singular and plural forms equally, e. g. “partner” and “partners” would lead to the same result. The first option – stemming. In that case, search engine would use root of a word, e. g. “partn”. However, stemming has many weaknesses: two different words might have same root, a

## Disproportionality Data

March 25, 2012
So I was hunting around for some data on disproportional electoral outcomes (when the proportion of voters cast for political parties is not close to the proportion of legislative seats that they win).Michael Gallagher keeps an updated version of his L...

## Citations in markdown using knitr

March 24, 2012
I am finding myself more and more drawn to markdown rather then tex/Rnw as my standard format (not least of which is the ease of displaying the files on github, particularly now that we have automatic image uploading). One thing I miss from latex is the citation commands. (I understand these can be provided to

## Initial release 0.1.0 of package RcppSMC

March 24, 2012
Hm, I realized that I announced this on Google+ (via Rcpp) as well as on Twitter, on the r-packages list, wrote a new and simple web page for it, but had not put it on my blog. So here is some catching up. Sequential Monte Carlo / Particle Filter is ...

## Custom Summary Stats as Dataframe or List

March 24, 2012
On Stackoverflow I found this useful example on how to apply custom statistics on a dataframe and return the results as list or dataframe:somedata<- data.frame(               ...

March 24, 2012
## Linking apple liking to sensory

March 24, 2012
Previously it was seen that apple liking was related to consumers scores for juiciness and sweetness. It would be most nice if these scores can be linked to sensory scores. Thus a three block model would result:A block with sensory data describing how ...

## Video: R at Work and at Home

March 24, 2012
The following video was filmed at Melbourne R Users. The description of the talk from the meetup site: Eu Jin is a Senior Analyst with Deloitte Analytics in Melbourne. He has over four years experience in data mining and statistical … Continue reading →

## R script to calculate QIC for Generalized Estimating Equation (GEE) Model Selection

March 23, 2012
Generalized Estimating Equations (GEE) can be used to analyze longitudinal count data; that is, repeated counts taken from the same subject or site. This is often referred to as repeated measures data, but longitudinal data often has more repeated observations. … Continue reading →

## Gini Efficient Frontier

March 23, 2012
$Gini Efficient Frontier$

David Varadi have recently wrote two posts about Gini Coefficient: I Dream of Gini, and Mean-Gini Optimization. I want to show how to use Gini risk measure to construct efficient frontier and compare it with alternative risk measures I discussed previously. I will use Gini mean difference risk measure – the mean of the difference

## Serious stats – free statistics resources

March 23, 2012
The companion web site for Serious Stats is now live:http://www.palgrave.com/psychology/baguley/The web site includes:- a free sample chapter (Chapter 15: Contrasts)- data sets- R scripts- 5 online supplements (for meta-analysis, multiple imputation, r...

## Serious stats companion web site now live: sample chapter, data and R scripts

March 23, 2012
The companion web site for Serious stats is now live: http://www.palgrave.com/psychology/Baguley/ It includes a sample chapter (Chapter 15: Contrasts), data sets, R scripts for all the examples and supplementary material. Filed under: news, R code, ser...

## Dissimilarity Between Soil Profiles: A Closer Look

March 23, 2012
Continuing the previous discussion of pair-wise dissimilarity between soil profiles, the following demonstration (code, comments, and figures) further elaborates on the method. A more in-depth discussion of this example will be included as a vignette w...

## Launching iButton Thermochrons with the help of R

March 23, 2012
Maxim's iButton Thermochron temperature dataloggers are little silver doo-dads the size of a large watch battery that can record up to 2048 time-stamped temperature values. The internal battery is usually good for a few years of use. Maxim supplies a J...

## R in Google Summer of Code 2012

March 23, 2012
This post is a slightly revised (and "blogified") version of the message Brian Peterson has sent to various R mailing lists.Once again, R has been accepted as a mentoring organization for the Google Summer of Code (2012).  We invite students interested in this program to learn more about it.  A good starting point...

## RStudio Development Environment

March 23, 2012
Compared to many other languages of equal popularity, there are realtively few development environments for R. In fact, the total number of production ready R IDEs could probably be counted on one hand. That deficiency is a small price to pay to use R and if you’re not already accustomed to using IDEs for other The post RStudio...

March 23, 2012
Ed Chen is a data scientist at Twitter, so he's accustomed to working with big data and complex models. In an interview with MIT Technology Review, he describes his data science toolbox: A common pattern for me is that I'll code a MapReduce job in Scala, do some simple command-line munging on the results, pass the data into Python...

## This graph shows that President Obama’s proposed budget treats the NIH even worse than G.W. Bush – Sign the petition to increase NIH funding!

March 23, 2012
The NIH provides financial support for a large percentage of biological and medical research in the United States. This funding supports a large number of US jobs, creates new knowledge, and improves healthcare for everyone. So I am signing this petiti...

## Low (and high) volatility strategy effects

March 23, 2012
Does minimum variance act differently from low volatility?  Do either of them act like low beta?  What about high volatility versus high beta? Inspiration Falkenblog had a post investigating differences in results when using different strategies for low volatility investing.  Here we look not at a single portfolio of a given strategy over time, but … Continue reading...

## Forecasts and ggplot

March 22, 2012
The forecast package uses the base R graphics for all plots, but some people may prefer to use the nice graphics available using the ggplot2 package. In the following two posts, Frank Davenport shows how it can be done: Plotting forecast() objects in ...

## Project Euler: Problem 20

March 22, 2012
n! means n x (n - 1) x ... x 3 x 2 x 1For example, 10! = 10 x 9 x ... x 3 x 2 x 1 = 3628800,and the sum of the digits in the number 10! is 3 + 6 + 2 + 8 + 8 + 0 + 0 = 27.Find the sum of the di...

## Do we appreciate sunbathing in Spring ?

March 22, 2012
We are currently experiencing an extremely hot month in Montréal (and more generally in North America). Looking at people having a beer, and starting the first barbecue of the year, I was wondering: if we asked people if global warming was a good ...

## Using Ggplot2 to plot last.fm top 100 albums

March 22, 2012
I found out that last.fm had made data files available for their Best of 2011 artist list, and I thought it'd be a great opportunity to learn some more about data management in R and Ggplot2.