March 19, 2011
Some may have had reservations about the “randomness” of the straws I plotted to illustrate Bertrand’s paradox. As they were all going North-West/South-East. I had actually made an inversion between cbind and rbind in the R code, which explained for this non-random orientation. Above is the corrected version, which sounds “more random” indeed. (And using

## How to: Binomial regression models in R

March 19, 2011
$How to: Binomial regression models in R$

Ever wondered how to predict success or failure as a function of other variables? Here's a quick tutorial on binomial regression in R.

## New GenABEL Website, and more *ABEL software

March 18, 2011
The *ABEL suite of R packages and software for genetic analysis has grown substantially since the appearance of GenABEL and the previously mentioned ProbABEL R packages. There are now a handful of useful R packages and other software utilities facilita...

## How to display scatter plot matrices with R and lattice

In lattice, there is a function called splom for the display of scatter plot matrices. For large datasets, the panel.hexbinplot from the hexbin package is a better option than the default panel. As an example, let’s use some meteorological data from MAPA-SIAR: library(solaR) library(hexbin) aranjuez <- readMAPA(prov=28, est=3, start='01/01/2004', end='31/12/2010') aranjuezDF <- subset(as.data.frame(getData(aranjuez)), select=c('TempMedia', 'TempMax',

## Flying off the Rack: R and the web in 2011

March 18, 2011
If there is ever a time to learn R and web application development, it is now...in the age of Big Data. The upcoming release of R 2.13 will provide basic functionality for developing R web applications on the desktop via the internal HTTP server, but t...

## Some upcoming R courses

March 18, 2011
A couple of quick notes about some upcoming R courses: In Vancouver, Canada, R trainer Isabella Ghement is presenting two R courses: An Introduction to the Statistical Software Package R, 8:30am-4:30pm, March 30-31, 2011, Vancouver, B.C., Canada (http://www.ghement.ca/RworkshopMarch30and31_2011.html) Advanced Statistical Modeling Using the Statistical Software Package R, 8:30am-4:30pm, May 5-6, 2011, Vancouver, B.C., Canada (http://www.ghement.ca/RworkshopMay5and6_2011.html); And in Seattle, Washington...

## More fun with sed

March 18, 2011
So I have this strange date and time string, which I would like to convert to a “useable” date, i.e., something that a spreadsheet programme or R can work with. It looks like this (MON has 3 chars): ddMONyr:hh:mm:ss The … Continue reading →

## Machine Learning Ex5.1 – Regularized Linear Regression

March 18, 2011
Exercise 5.1 Improves the Linear Regression implementation done in Exercise 3 by adding a regularization parameter that reduces the problem of over-fitting.Over-fitting occurs especially when fitting a high-order polynomial, that we will try to do here.DataHere's the points we will make a model from:# linear regression mydata = read.csv("http://spreadsheets.google.com/pub?hl=en_GB&hl=en_GB&key=0AnypY27pPCJydGhtbUlZekVUQTc0dm5QaXp1YWpSY3c&output=csv", header = TRUE)# view data plot(mydata)

## The housing bubble by city

March 17, 2011
The housing bubble by city. Miami sailed high and fell far. Detroit rose modestly and but dropped more than it went up. Dallas held steady. DC is enjoying a bit of renewed growth, but are in and New York yet to fall?

## La historia detrás del software: el caso de R.

Hoy en día, en cualquiera que sea nuestra área de aplicación de la estadística requiere que sepamos programar. Los software convencionales como SPSS y STATA son un tanto limitados, las actualizaciones no son tan constantes y su precio puede ser con...

March 17, 2011
Thanks to Jim, I’ve been using R in the shell more and more – in concert with vi. It’s been fun, and nice to integrate my workflows all on the server (although I haven’t had to do much graphing yet – I’m sure I’ll start kvetching then and return to a nice gui). One thing

## Staying up to date on R packages

March 17, 2011
Unless you regularly use particular R packages,  it’s becomes difficult to stay on top of updates and bug fixes.  Updates usually also include significant improvements in performance.  I wrote this short snippet of code which I run about once a month to keep up on updates. This short bit of code will give you a

## Updated tty Connection for R

March 17, 2011
Below are some links to a patch against the R-2.12.2 source code that implements a tty connection for R. Since the release of R-2.13.0 is coming soon, I’ll have a patch for it soon also. What’s a tty connection? The tty connection is an R interface to computer terminals, as defined by the Portable Operating

## Circular or spherical data, and density estimation

March 17, 2011
I few years ago, while I was working on kernel based density estimation on compact support distribution (like copulas) I went through a series of papers on circular distributions. By that time, I thought it was something for mathematicians working ...

## Applying functions on groups: sqldf, plyr, doBy, aggregate or data.table ?

March 17, 2011
Which one of the sqldf, plyr, doBy and aggregate functions/packages would be faster for applying functions on groups of rows? I was wondering about this earlier in this post.  It seems sqldf would be the fastest according to a post in manipulatr m...

## \$3.2M in prizes for predicting hospitalization

March 17, 2011
Heritage Health and Kaggle have teamed up to create the biggest data science competition thus far: the Heritage Health Prize, which challenges competitors to build a statistical model to predict the number of days a person is likely to spend in hospital over the next year, based on (anonymized) factors such as demographics, medical visits and treatments, and other...

## Risk-Opportunity Analysis: Houston

March 17, 2011
I will be attending Ralph Vince's risk-opportunity analysis workshop in Houston this weekend.  I'll be in town Friday-Monday.  Drop me a note if you're in the area and would like to meet for coffee / drinks.

## Global Migration Maps

March 17, 2011
Migrations of people have existed for millennia and oc

## basic ggplot2 network graphs

March 17, 2011
I have been looking around on the web and have not found anything yet related to using ggplot2 for making graphs/networks. I put together a few functions to make very simple graphs. The bipartite function especially is not ideal, as of course we only w...

## Having a problem with R-2.12.2 64-bit and "gam’ package!

March 17, 2011
While working with some pitch location data recently, I ran across something strange when using my new computer (with R-2.12.2 64-bit) versus my work computer (with R-2.11.1 x64). Both are 64-bit computers, but I got the new one for portability (it's a laptop) and speed.Anyway, I had been doing some work in the office with Pitch F/X data,...

## Canabalt Revisited: Gamma Distributions, Multinomial Distributions and More JAGS Goodness

March 16, 2011
Introduction Neil Kodner recently got me interested again in analyzing Canabalt scores statistically by writing a great post in which he compared the average scores across iOS devices. Thankfully, Neil’s made his code and data freely available, so I’ve been revising my original analyses using his new data whenever I can find a free minute.

## How the New York Times uses R for Data Visualization

March 16, 2011
The New York Times introduced R to the world with a feature article in 2009, and has been using R for many years to support its pioneering presentation data analysis and visualization, under the direction of graphics editor Amanda Cox. Last week, the New York R User Group's featured speaker was Amanda Cox, where she presented ... how R...

## Updates to SoilWeb Mobile: Distance from Nearest Map Unit Boundary

March 16, 2011
Working on some new ideas on how map unit data can be summarized on small screens-- particularly for our mobile version of SoilWeb. The distance from the nearest map unit polygon boundary is now printed above mini soil profile sketches. This gives the ...

## Textural triangle plot in R

March 16, 2011
Hi,these days I'm working with soil textural data and one of the key point of these data is the presentation of the results.The best way is a old-school texture triangle!!!Because I like to do all my stuff in R, instead of opening draw software such as...

