## Maximal Information Coefficient (MIC)

December 19, 2011
Pearson r correlation coefficients for various distributions of paired data (Credit: Denis Boigelot, Wikimedia Commons)A paper published this week in Science outlines a new statistic called the maximal information coefficient (MIC), which is able to equally describe the correlation between paired variables regardless of linear or nonlinear relationship. In...

## The R Journal (Volume 3/2, December 2011) is out

December 19, 2011
The new R journal for December 2011 is out! You can Download the complete issue from here, while refereed articles may be downloaded individually using the links below: Table of Contents Editorial 3   Contributed Research Articles   Creating and Deploying an Application with (R)Excel and R  Thomas Baier, Erich Neuwirth and Michele De Meo 5 glm2: Fitting Generalized Linear Models...

## Christmas Gift to the R Community: The R Journal!

December 19, 2011
The R Journal Volume 3/2 is available! Get it from here.

## Spatial Data with R

December 19, 2011
On September 14th 2011 Dr Alec Stephenson gave a talk on exploring spatial data with R (see Meetup page). The video of the talk is now available online. The talk provides a non-mathematical and entirely equation-free talk on visualizing and … Continue reading →

## data.frame objects in R (via “R in Action”)

December 18, 2011
The followings introductory post is intended for new users of R.  It deals with R data frames: what they are, and how to create, view, and update them. This is a guest article by Dr. Robert I. Kabacoff, the founder of (one of) the first online R tutorials websites: Quick-R.  Kabacoff has recently published the book ”R Read more...

## Portfolio Optimization in R, Part 1

December 17, 2011
I briefly mentioned in my last post; that I was fooling around with portfolio optimization in R.  This post will the first in a series on the topic of portfolio optimization. Please note, nothing I am about to say should be taken as advice for investing.  These results are based on prior observed returns and the future...

## Function to Collect Geographic Coordinates for IP-Addresses

December 17, 2011
I added the function IPtoXY to theBioBucket-Archives which collects geographic coordinates for IP-addresses.It uses a web-service at http://www.datasciencetoolkit.org// and works with the base R-packages. # System time to collect coordinates of 100 IP-...

## IRIS Flower Data Set (R-002)

December 17, 2011
Ver  primero: IRIS Flower Data Set (R-001)See first:        IRIS Flower Data Set (R-001)El comando "summary" nos ayuda a comprender la importancia de cada componente principal:Los "eigenvalues" son las desviacion...

## IRIS Flower Data Set (R-001)

December 17, 2011
IRIS Flower Data SetEste es el Link a Wikipedia donde podéis encontrar los datos que utilizó Fisher en su trabajo de 1936. Ya hemos trabajado con estos datos en Excel y los continuaremos usando en nuevas entradas.En este link, podemos ver las fotos de las flores (IRIS en castellano son lírios).Represento como LS (longitud del sépalo), AS...

## Ripley on model selection, and some links on exploratory model analysis

December 17, 2011
This is really fun. I love how Ripley thinks, with just about every concept considered in broad generality while being connected to real-data examples. He’s a great statistical storyteller as well. . . . and Wickham on exploratory model analysis I came across Ripley’s slides in a reference from Hadley Wickham’s article on exploratory model The post Ripley...

## cRazydays 2012 with ggplot2

December 17, 2011
Season’s Greetings Hi, dear R-bloggers and its readers. Here in Japan it’s very cold now. The end

## knitr: nice alternative for Sweave

December 17, 2011
I recently discovered knitr for dynamic report generation in R. It seems like a very powerful alternative to Sweave. Particularly, I am interested in png graphic device support (it supports more than 20 graphic devices) and R code formatting.Check it o...

## semi-automatic ABC

December 17, 2011
The talk of Wednesday afternoon Ordinary Meeting of the Royal Statistical Society went on quite well, I think. I would have expected a few people (in general) and some specific people (in particular) but this being the last week of term the schedule was not the best of times. Paul Fearnhead gave the talk, insisting

## ai-class.com vs ml-class.com

December 16, 2011
For those who did not know, Stanford university offered free off charge 3 courses at beginning of the autumn. It is kind of shocking – US based institution offers education for free! Take any socialism oriented country and one of the promises is education for free. But it seems, that the argument loosing the power – Stanford,

## Poor, Poor Hillary

December 16, 2011
This will be the last baby name related post but this came out of part two web scrapping post last month. I was looking for the fastest rising Names. I flip the logic and looked for the fastest declining names in relative popularity. Out of that exerci...

## Lattice Explore Bonds

December 16, 2011
Since my fifth most popular post has been Bond Market as a Casino Game Part 1, I thought I would use Vanguard Total US Bond Market mutual fund (VBMFX) monthly returns to build our skills in the lattice R package and help visualize the unbelievable run ...

## A quick primer on split-apply-combine problems

December 16, 2011
I’ve just answered my hundred billionth question on Stack Overflow that goes something like I want to calculate some statistic for lots of different groups. Although these questions provide a steady stream of easy points, its such a common and basic data analysis concept that I thought it would be useful to have a document

## To Sweave, or not to Sweave, that is the question

December 16, 2011
I am about to start writing up the manuscript of my recent biomath seminar (Act 3: Pineda-Krch. 2011. Cycles at the edge of existence: Emergence of quasi-cycles in strongly destabilizedecosystems.). While the slides for the talk were put together using Sweave … Continue reading →

## New Powerball (lottery) Rules Will Cost You More

December 16, 2011
The popular news are reporting that the Multi-State Lottery Commission (MUSL) will change the rules for their lottery game Powerball, effective Jan. 15, 2012. I sent an email to the MUSL (at 8:00am Dec, 14th) asking for the new official rules, but haven't received a response yet (as of 10:30am Dec, 16th). Hence, these

## Optimal regularization for smoothing splines

December 16, 2011
In smooth.spline procedure one can use df or spar parameter to control smoothing level. Usually they are not set manually but recently I was asked a question which one of them is a better measure of regularizatio...

## SVN Version Control, R, and some rambling thought on AWS,Rscripts

December 16, 2011
I do a alot of my modelling on Rstudio hosted on EC2 instances. If you don’t use, I would highly recommend. A brilliant tool. Kudos to the Rstudio team. I have made a personal and professional pledge to obsessively use version control. I hope to show...

## CrossValidated: A place to post your statistics questions

December 16, 2011
Seth Rogers writes: I am a member of an online community of statisticians where I burn a great deal of time (and a recovering cog sci researcher). Our community website is a peer-reviewed Q and A spanning stats topics ranging from applications to mathematical theory. Our online community consists of mostly university faculty, grad The post CrossValidated:...

## Psycho dice and Monte Carlo

December 16, 2011
$Psycho dice and Monte Carlo$

Following Pierre’s post on psycho dice, I want here to see by which average margin repeated plays might be called influenced by mind will. The rules are the following (exerpt from the novel Midnight in the Garden of Good and Evil, by John Berendt): You take four dice and call out four numbers between one

## The Bay Area R User Group Meeting on Data Mining with R

December 16, 2011
By Joseph Rickert Put up a poster that says something like “Data Mining with R” anywhere in the Bay Area and you will surely draw a crowd. But it was still a bit of a surprise that the monthly meeting of the Bay Area R User’s group was so well attended. At one point there were 160 people on...

## Backtesting Rebalancing methods

December 15, 2011
I wrote about Rebalancing in the Asset Allocation Process Summary post. Deciding how and when to rebalance (update the portfolio to the target mix) is one of the critical steps in the Asset Allocation Process. I want to study the portfolio performance and turnover for the following Rebalancing methods: Periodic Rebalancing: rebalance to the target

## Update your Windows PATH – revisited

December 15, 2011
Yihui got me psyched a little about GitHub After my last post about running your R infrastructure from an USB drive, he commented on my function that would update the Windows PATH (which is at least important for R and Rtools). Now I found some time to polish it a little. Feel free to test … Continue reading...

## EMC survey differentiates BI and Data Science

December 15, 2011
EMC last week published the results of a survey of 462 IT decision makers who self-identified as either a data scientist or business intelligence professional (plus 35 invitees who were attendees at the EMC Data Scientist Summity and/or Kaggle competitors). There's a nice summary of the conclusions at the EMC blog, (where data scientists are described as "The New...

## Bayesian inference and the parametric bootstrap

December 15, 2011
This paper by Brad Efron came to my knowledge when I was looking for references on Bayesian bootstrap to answer a Cross Validated question. After reading it more thoroughly, “Bayesian inference and the parametric bootstrap” puzzles me, which most certainly means I have missed the main point. Indeed, the paper relies on parametric bootstrap—a frequentist